Data  Acquisition  and  Preparation  for 
Social  Network  Analysis  Based  on 
Email 


Lessons  Learned 


Fred  Ma 

Joint  Studies  Operational  Research  Team 
Dave  Allen 

Experimentation  Operational  Research  Team 
Patrick  Dooley 

Canada  Command  Operational  Research  Team 


DRDC  CORA  TM  2009-030 
June  2009 


Defence  R&D  Canada 

Centre  for  Operational  Research  &  Analysis 


Joint  Staff  Operational  Research  Team 


National 

Defence 


Defense 

nationale 


Canada 


Data  Acquisition  and  Preparation  for  Social 
Network  Analysis  Based  on  Email 

Lessons  Learned 


Fred  Ma 

Joint  Studies  Operational  Research  Team 
Dave  Allen 

Experimentation  Operational  Research  Team 
Patrick  Dooley 

Canada  Command  Operational  Research  Team 


Defence  R&D  Canada  -  CORA 

Technical  Memorandum 
DRDC  CORA  TM  2009-030 
June  2009 


Principal  Author 

Original  signed  by  Fred  Ma 


Fred  Ma 

Defence  Scientist,  DRDC  CORA 

Approved  by 

Original  signed  by  Charles  Morrisey 
Charles  Morrisey 

Acting  Section  Head,  Joint  &  Common  OR 

Approved  for  release  by 

Original  signed  by  Dale  Reding 
Dale  Reding 

Chief  Scientist,  DRDC  CORA 


This  work  was  sponsored  by  Canadian  Forces  Experimentation  Centre,  Command  and  Sense 
Team,  under  project  Polar  Guardian. 


Defence  R&D  Canada  -  Centre  for  Operational  Research  and  Analysis  (CORA) 


©  Her  Majesty  the  Queen  in  Right  of  Canada,  as  represented  by  the  Minister  of  National  Defence,  2009 


©  Sa  Majeste  la  Reine  (en  droit  du  Canada),  telle  que  representee  par  le  ministre  de  la  Defense  nationale, 
2009 


Abstract 


In  sharing  information  to  improve  situational  awareness,  other  government  departments  and 
remotely  situated  outposts  may  vary  in  their  reporting  of  information.  A  social  network  analysis 
was  initiated  within  the  Department  of  National  Defence  to  show  where  informal  communication 
may  be  significant  to  information  sharing.  The  study  was  undertaken  circa  Q3  2006  by  the 
Experimentation  Operational  Research  Team  at  the  Canadian  Forces  Experimentation  Centre  for 
the  Command  and  Sense  Team.  Analytical  results  are  not  available,  as  the  undertaking  was  not 
completed.  This  report  describes  the  lessons  learned  in  planning  the  data  collection  and 
preparation  for  the  social  network  analysis. 

The  work  was  done  under  project  Polar  Guardian,  the  goal  of  which  was  to  assess  situational 
awareness  in  the  arctic.  The  plan  for  the  social  network  analysis  included  an  initial  email-based 
phase  and  a  follow-up  survey-based  phase.  This  report  focuses  on  the  email  phase;  it  is  not  a 
comparison  of  the  two  phases  as  separate  approaches. 

Due  to  the  short  time  frame  for  conducting  the  trial  on  the  social  network  analysis  approach,  in- 
house  methods  for  data  acquisition  and  analysis  were  explored.  The  main  challenges  in  this 
approach  arise  from  generating  the  communications  data  from  email  tracking  logs  in  isolation 
from  other  information  gathering  and  information  providing  parts  of  a  corporate  computer 
network. 

Commercial  tools  were  investigated,  and  warrant  further  examination.  Their  use  requires  a  longer 
time  frame  for  approval  and  installation  on  the  Defence  Wide  Area  Network. 

Of  the  commercial  and  home-grown  approaches,  the  most  time  is  likely  needed  for  solutions 
involving  access  to  the  servers,  and  deployment  of  applications  on  them. 

Direct  access  to  subject  matter  expertise  in  email  administration  is  essential  to  arriving  at  a  means 
for  effective  and  timely  data  gathering  and  preparation.  Such  access  is  also  essential  for  an 
interagency  social  network  analysis,  the  issues  of  which  are  touched  upon  in  this  technical 
memorandum  only  at  a  high  level. 
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Resume 


Dans  le  partage  de  1’  information  pour  augmenter  les  connaissances  de  la  situation,  il  peut  y  avoir 
des  variations  dans  les  rapports  d’ information  produits  par  les  autres  ministeres  et  les  postes  en 
region  eloignee.  Une  analyse  des  reseaux  sociaux  a  ete  entreprise  au  sein  du  Ministere  de  la 
Defense  nationale  pour  demontrer  comment  les  communications  informelles  peuvent  avoir  de 
l’iniportance  dans  le  partage  de  cette  information.  L’equipe  de  recherche  operationnelle 
experimentale  du  Centre  d'experimentation  des  Forces  canadiennes  a  entrepris  cette  etude  pour  le 
compte  de  l’equipe  Commandement  et  detection  pendant  le  troisieme  trimestre  de  2006.  L’equipe 
n’a  pas  publie  les  resultats  de  l’analyse  puisque  l’etude  a  ete  suspendue.  Le  present  rapport  decrit 
les  logons  retenues  dans  la  planification  de  la  collecte  de  donnees  et  dans  la  preparation  de 
l’analyse  des  reseaux  sociaux. 

Le  travail  a  ete  execute  dans  le  cadre  du  projet  Polar  Guardian,  dont  le  but  etait  d’evaluer  les 
connaissances  de  la  situation  dans  l’Arctique.  Le  plan  de  l’analyse  des  reseaux  sociaux 
comprenait  une  premiere  etape  axee  sur  les  courriels  et  une  deuxieme  etape  axee  sur  un  sondage 
de  suivi.  Le  present  rapport  se  penche  sur  l’etape  axee  sur  les  courriels.  II  ne  s’agit  pas  cependant 
d’une  comparaison  des  deux  etapes  en  tant  qu’approches  distinctes. 

En  raison  du  court  delai  pour  mener  les  essais  sur  l’approche  pour  l’analyse  des  reseaux  sociaux, 
nous  avons  etudie  des  methodes  internes  pour  acquerir  et  analyser  les  donnees.  Les  principaux 
defis  de  cette  approche  sont  la  generation  des  donnees  de  communication  a  partir  des  registres  de 
suivi  des  courriels,  de  maniere  isolee  des  autres  composantes  de  collecte  et  de  partage 
d’information  dans  unreseau  informatique  d’entreprise. 

Notre  exploration  preliminaire  des  outils  commerciaux  justifie  un  examen  approfondi  de  ceux-ci. 
L’utilisation  de  ces  outils  necessite  cependant  un  long  delai  pour  les  approuver  et  les  installer  sur 
le  Reseau  etendu  de  la  Defense. 

De  toutes  les  solutions  commerciales  ou  internes  etudiees,  les  methodes  qui  prendront 
probablement  le  plus  de  temps  sont  celles  qui  necessitent  un  acces  aux  serveurs  et  un  deployment 
d’applications  sur  les  serveurs. 

Un  acces  direct  aux  experts  en  la  matiere  sur  1’ administration  des  courriels  est  un  element 
essentiel  afm  d’arriver  a  un  moyen  efficace  de  recueillir  et  de  preparer  les  donnees  en  temps  utile. 
L’acces  a  ce  genre  d’expertise  est  egalement  essentiel  a  une  analyse  des  reseaux  sociaux  entre  les 
organisations,  dont  les  defis  sont  couverts  sommairement  dans  le  present  document. 
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Executive  summary 


Data  Acquisition  and  Preparation  for  Social  Network  Analysis 
Based  on  Email:  Lessons  Learned 

Fred  Ma;  Dave  Allen;  Patrick  Dooley;  DRDC  CORA  TM  2009-030;  Defence  R&D 
Canada  -  CORA;  June  2009. 

Introduction:  Arctic  security  is  becoming  an  area  of  growing  concern.  In  support  of  the 
Canadian  Forces  Experimentation  Centre's  (CFEC’s)  Command  and  Sense  team1,  a  social 
network  analysis  (SNA)  was  undertaken  circa  Q3  2006  by  the  CFEC’s  Experimentation 
Operational  Research  Team  (EXORT)  to  provide  visibility  into  the  informal  sharing  of 
information  between  agencies  and  within  the  Department  of  National  Defence  (DND)  that  can 
affect  situational  awareness  in  the  arctic.  It  was  part  of  a  larger  project,  Polar  Guardian,  whose 
purpose  included  (at  different  times)  modelling  ‘As-Is’  surveillance  capabilities,  identifying 
shortcomings,  and  modelling  ‘To-Be’  capabilities  to  guide  decisions  on  the  way  ahead.  As  a  first 
step,  an  SNA  was  considered  important  because  of  the  expected  variability  with  which  different 
organizations  report  information,  especially  in  remote  regions.  A  view  of  information 
communicated  informally  would  give  an  idea  of  the  accuracy  of,  and  possibly  augment, 
modelling  of  standard  operating  procedures  for  reporting. 

Since  interagency  sharing  of  information  is  anything  but  trivial,  an  internal  DND  SNA  was  first 
undertaken  on  the  Defence  Wide  Area  Network.  Due  to  changing  priorities  in  CFEC’s 
transformation,  however,  Polar  Guardian  was  terminated  in  the  data  acquisition  planning  phase 
and  the  SNA  was  not  performed.  This  technical  memorandum  captures  lessons  learned  in  the 
acquisition  and  preparation  of  data  on  corporate  email,  which  comprises  the  first  phase  of  the 
SNA;  it  is  not  a  comparison  of  email-  and  survey-based  SNA.  The  challenges  to  the  former  (not 
known  at  the  outset)  were  driven  by  constraints  in  administration,  policy,  time,  cost,  and  access  to 
technical  expertise.  They  differ  from  those  for  an  SNA  of  communications  in  an  experiment- 
specific  common  operating  environment;  the  volume  of  data  is  larger,  and  issues  arose  from  the 
fact  that  only  tracking  logs  were  accessible  (pending  legal  approval,  which  was  being  pursued 
when  Polar  Guardian  was  put  on  hold). 

Results:  The  amounts  and  forms  of  user  identification  data  was  highly  variable,  and  results  from 
efforts  in  characterizing  the  data  are  documented.  Home-grown  approaches  to  user  identification 
are  mapped  out,  along  with  their  limitations,  uncertainties,  and  challenges.  The  expedient 
approach  of  discarding  irresolvable  data  would  yield  an  SNA  of  unknown  accuracy.  Commercial 
tools  for  generating  the  data  were  vetted  based  on  the  constraints.  Recommendations  are  given 
for  future  SNAs,  including  activities  to  start  early  in  an  SNA  due  to  potentially  long  resolution 
times. 

As  an  alternative  to  home-grown  solutions,  an  investigation  is  suggested  of  the  capabilities  built 
into  the  mail  server  software,  as  is  re-examination  of  commercial  tools  that  provide  summary 
statistics  (possibly  using  directory  services  for  user  identification).  These  require  a  longer  time 


1  CFEC  has  since  reorganized  into  different  teams. 
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frame  for  access/installation  approvals.  For  commercial  products,  time  is  also  needed  to  try  the 
product  out,  and  for  purchasing  administration. 

Access  to  subject  matter  expertise  in  email  administration  and/or  Microsoft  Exchange  Server®  is 
essential,  especially  if  it  can  include  knowledge  about  email  administration  within  DND.  In 
addition  to  its  requirement  for  solutions  implemented  on  the  mail  servers,  such  experience 
narrows  down  the  contingencies  for  which  solutions  need  to  be  planned.  It  also  informs  the 
assessment  of:  (1)  how  the  SNA  is  impacted  by  a  solution’s  shortcomings  in  identification,  (2)  the 
challenges  and  risk/uncertainties  (technical  and  administrative/policy),  and  (3)  the  resources  and 
level  of  effort  needed  to  accommodate  or  mitigate  the  challenges  and  uncertainties.  For  example, 
knowledge  of  the  degree  to  which  solutions  encroach  upon  security-motivated  restrictions  can  be 
taken  into  consideration  in  planning  the  implementation  of  those  solutions;  any  measures  that  can 
be  taken  to  minimize  the  encroachment  improves  the  chances  of  approval  for  such  solutions. 

Significance:  Interagency  sharing  of  information  involves  communication  between  different 
organizations,  with  variability  in  training  and  culture  (corporate  and  social).  Compared  to 
communication  within  DND,  therefore,  it  is  reasonable  to  expect  a  greater  variation  in  adherence 
to  formal  procedures  for  prompt  reporting  of  information,  particularly  in  remote  locations.  An 
SNA  can  indicate  where  informal  communications  can  be  significant  to  situational  awareness. 
The  discovered  challenges,  potential  approaches  to  their  solution,  and  lessons  captured  here  can 
inform  the  planning  of  future  SNAs. 

Future  plans:  For  an  email  SNA  within  DND,  the  approaches  scoped  out  in  this  document  vary 
in  detail.  Some  require  further  investigation,  and  a  solution  to  user  identification  is  yet  to  be 
implemented.  For  an  interagency  SNA,  the  detailed  challenges  are  largely  unknown,  though 
anticipated  challenges  at  a  general  level  are  presented.  The  analyst  for  such  a  study  should  work 
with  subject  matter  experts  to  map  out  technical,  policy,  and  cultural  challenges  and  solutions.  It 
is  expected  that  a  prior  SNA  within  DND  would  help  generate  buy-in  among  other  governmental 
departments  (OGDs).  Though  there  are  trade-offs,  surveys  and  interviews  can  be  used 
exclusively  if  generating  data  from  email  is  beyond  the  scope  of  the  SNA2. 


2  In  terms  of  effort,  resourcing,  and  time  frame. 
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Data  Acquisition  and  Preparation  for  Social  Network  Analysis 
Based  on  Email:  Lessons  Learned 

Fred  Ma;  Dave  Allen;  Patrick  Dooley;  DRDC  CORA  TM  2009-030;  R  &  D  pour  la 
defense  Canada  -  CORA;  Juin  2009. 

Introduction  :  La  securite  dans  l’Arctique  est  devenue  un  domaine  de  preoccupation  croissant. 
En  soutien  a  l’equipe  Commandement  et  detection  du  Centre  d’experimentation  des  Forces 
canadiennes  (CEFC)3,  l’equipe  de  recherche  operationnelle  experimental  (EXORT)  du  CEFC  a 
entrepris  une  analyse  des  reseaux  sociaux  (ARS)  pendant  le  troisieme  trimestre  de  2006  afm  de 
donner  de  la  visibilite  au  partage  d’information  informel  entre  les  organisations  et  au  sein  du 
ministere  de  la  Defense  nationale  (MDN)  pouvant  avoir  une  importance  sur  les  connaissances  de 
la  situation  dans  l’Arctique.  Cette  etude  faisait  partie  d’un  projet  plus  vaste,  Polar  Guardian,  dont 
le  but  comprenait  (a  des  moments  differents)  la  modelisation  des  capacites  de  surveillance 
actuelle,  la  determination  des  lacunes,  et  la  modelisation  des  capacites  voulues  pour  guider  les 
decisions  a  venir.  Comme  premiere  etape,  on  a  juge  important  d’executer  une  analyse  des  reseaux 
sociaux  a  cause  des  differences  attendues  dans  la  maniere  dont  chaque  organisation  rapporte 
F information,  particulierement  dans  les  regions  eloignees.  Un  portrait  de  l’information 
communiquee  de  maniere  informelle  donnerait  une  idee  de  l’exactitude  des  instructions 
permanentes  d’operations  pour  la  production  de  rapports  d’information,  et  possiblement,  d’en 
augmenter  la  modelisation. 

Etant  donne  que  le  partage  d’information  entre  les  organisations  n’a  rien  de  banal,  nous  avons 
d’abord  entrepris  une  analyse  interne  des  reseaux  sociaux  au  MDN  sur  le  Reseau  etendu  de  la 
Defense.  Mais  en  raison  des  changements  de  priorites  apportes  par  la  transformation  du  CEFC,  on 
a  mis  fin  a  Polar  Guardian  pendant  l’etape  de  la  planification  de  l’acquisition  des  donnees,  par 
consequent,  l’analyse  des  reseaux  sociaux  n’a  pas  eu  lieu.  Le  present  document  technique 
presente  les  lefons  retenues  dans  l’acquisition  et  la  preparation  des  donnees  sur  les  courriels 
ministeriels,  qui  constituent  la  premiere  etape  de  l’analyse  des  reseaux  sociaux.  Le  present 
document  ne  compare  pas  l’analyse  des  reseaux  sociaux  faite  a  partir  de  courriels  a  celle  faite  a 
partir  de  sondages.  Les  defis  (inconnus  au  depart)  de  l’analyse  a  partir  des  courriels  provenaient 
des  contraintes  aux  plans  de  1’ administration,  des  politiques,  du  temps,  des  couts  et  de  Faeces  a 
l’expertise  technique.  Ces  defis  sont  differents  de  ceux  d’une  analyse  des  communications  menee 
dans  un  environnement  d’ exploitation  commun  propre  a  une  etude  en  particular;  le  volume  de 
donnees  est  plus  grand  par  exemple.  Des  questions  sont  survenues  parce  que  seuls  les  registres  de 
suivi  etaient  accessibles  (en  attendant  une  approbation  juridique,  que  nous  tentions  d’obtenir  au 
moment  de  l’arret  de  Polar  Guardian). 

Resultats  :  La  quantite  et  les  formes  de  donnees  d’identification  des  utilisateurs  etaient 
grandement  variables.  Nous  avons  documente  les  resultats  des  efforts  a  caracteriser  ces  donnees. 
Nous  avons  schematise  les  methodes  maison  pour  F  identification  des  utilisateurs  et  indique  leurs 
limites,  leurs  incertitudes  et  les  difficultes.  La  methode  expeditive  d’eliminer  toutes  les  donnees 
irreconciliables  produirait  une  analyse  d’une  exactitude  inconnue.  Etant  donne  les  contraintes, 


3  Le  CEFC  a  ete  reorganise  depuis  en  equipes  differentes. 
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nous  avons  choisi  des  outils  commerciaux  pour  generer  les  donnees.  Nous  avons  formule  des 
recommandations  pour  les  analyses  des  reseaux  sociaux  futures,  y  compris  de  commencer  les 
activites  de  l’analyse  tot,  en  raison  des  longs  delais  potentiels  de  resolution. 

Comme  option  de  rechange  aux  solutions  maison,  nous  suggerons  une  enquete  sur  les  capacites 
deja  presentes  dans  le  logiciel  du  serveur  de  courriels,  ainsi  qu’un  nouvel  examen  des  outils 
commerciaux  qui  foumissent  des  statistiques  sommaires  (qui  utiliseraient  un  repertoire  pour 
l’identification  des  utilisateurs,  par  exemple).  Ces  outils  necessitent  un  long  delai  pour  l’obtention 
des  approbations  d’acces  et  pour  leur  installation.  Dans  le  cas  des  produits  commerciaux,  un  delai 
est  egalement  requis  pour  essayer  le  produit  et  administrer  son  achat. 

L’acces  a  des  experts  de  1’ administration  des  systemes  de  courriels  ou  de  Microsoft  Exchange 
Server  est  essentiel,  particulierement  s’ils  possedent  en  plus  des  connaissances  sur 
E administration  des  courriels  au  sein  du  MDN.  Une  expertise  de  ce  genre,  en  plus  d’etre 
necessaire  pour  appliquer  les  solutions  aux  serveurs  de  courriels,  reduirait  le  nombre 
d’eventualites  pour  lesquelles  il  faut  planifier  des  solutions.  Par  ailleurs,  ces  experts  peuvent 
renseigner  sur  :  (1)  la  maniere  dont  l’analyse  des  reseaux  sociaux  est  influencee  par  les  lacunes  de 
la  solution  dans  le  domaine  de  1’ identification;  (2)  les  difficultes,  les  risques  et  les  incertitudes 
(aux  plans  technique,  administratif  et  politique);  et  (3)  les  ressources  et  le  niveau  d’effort  requis 
pour  s’ adapter  aux  difficultes  et  aux  incertitudes  ou  y  remedier.  Par  exemple,  en  sachant  le  degre 
d’empietement  des  solutions  etudiees  sur  les  restrictions  imposees  par  la  securite,  il  est  possible 
d’en  tenir  compte  lors  de  la  planification  et  de  la  mise  en  oeuvre  des  solutions;  ainsi,  toutes  les 
mesures  qui  peuvent  etre  prises  pour  reduire  cet  empietement  ameliorent  les  chances 
d’approbation  de  la  solution  choisie. 

Importance  :  Le  partage  d’information  entre  les  organisations  entraine  des  communications 
entre  differentes  organisations  dotees  de  formation  et  de  culture  (sociale  et  d’entreprise)  diverses. 
En  comparaison  aux  communications  au  sein  du  MDN,  il  est  raisonnable  de  s’attendre  a  une  plus 
grande  variation  dans  le  respect  des  procedures  officielles  pour  signaler  promptement  de 
1’ information,  particulierement  dans  le  cas  des  postes  eloignes.  Une  analyse  des  reseaux  sociaux 
peut  indiquer  a  quel  moment  et  a  quel  endroit  les  communications  informelles  peuvent  etre 
importantes  aux  connaissances  de  la  situation.  Les  defis  decouverts,  les  approches  aux  solutions 
possibles  et  les  logons  retenues  decrits  dans  le  present  document  peuvent  informer  la  planification 
des  analyses  de  reseaux  sociaux  futures. 

Perspectives  :  Dans  le  cas  d’une  analyse  des  reseaux  sociaux  a  partir  des  courriels  au  sein  du 
MDN,  les  approches  etudiees  dans  le  present  document  varient  dans  le  detail.  Certaines  requierent 
un  examen  approfondi.  Par  ailleurs,  une  solution  a  l’identification  des  utilisateurs  reste  encore  a 
mettre  en  oeuvre.  Dans  le  cas  d’une  analyse  des  reseaux  sociaux  entre  les  organisations,  les  defis 
precis  demeurent  encore  largement  inconnus,  quoique  le  present  document  presente  les  defis 
d’ordre  general  prevus.  Les  analystes  charges  d’une  analyse  de  ce  genre  devraient  travailler  avec 
des  experts  en  la  matiere  pour  dresser  les  defis  et  les  solutions  aux  plans  technique,  politique  et 
culturel.  Il  est  attendu  qu’une  analyse  des  reseaux  sociaux  menee  au  prealable  au  sein  du  MDN 
favoriserait  l’adhesion  des  autres  ministeres.  En  outre,  bien  qu’ils  constituent  des  compromis,  il 
est  possible  d’utiliser  des  sondages  et  des  entrevues  exclusivement,  si  la  production  de  donnees  a 
partir  des  courriels  est  au-dela  de  la  portee  de  l’analyse  des  reseaux  socia  ux4. 


4  En  terme  d’effort,  de  ressource  et  de  calendrier. 
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1  Introduction 


Polar  Guardian  was  a  project  undertaken  by  the  Canadian  Forces  Experimentation  Centre’s 
(CFEC’s)  Command  and  Sense  (C&S)  team5  to  assess  and  improve  situational  awareness  (SA)  in 
the  arctic.  Under  this  project,  circa  Q3  2006,  CFEC’s  Experimentation  Operational  Research 
Team  (EXORT)  launched  a  study  of  the  social  network  of  relevant  organizations  to  better 
understand  the  flow  of  information  pertaining  to  SA  in  the  arctic.  Due  to  CFEC’s  planned 
transformation  into  the  Canadian  Forces  Warfare  Center  (CFWC),  this  project  was  put  on  hold. 

This  report  captures  the  lessons  learned  about  preparing  for  such  a  social  network  analysis  (SNA), 
to  serve  as  a  springboard  for  future  efforts.  In  particular,  this  report  focuses  on  an  initial  email- 
based  phase,  which  was  to  be  followed  by  a  survey -based  phase;  these  two  phases  are  not  treated 
as  separate  approaches  to  be  compared.  The  goal  is,  as  much  as  possible,  to  save  future  executors 
of  SNA  from  repeating  the  means-oriented  investigations6  that  were  performed  for  email  under 
Polar  Guardian,  and  to  guide  any  investigations  with  observations  from  this  effort,  conclusions, 
conjectures,  and  reasoned  ramifications.  Therefore,  this  technical  note  is  means-oriented  rather 
than  ends-oriented. 

Since  access  to  expertise  in  email  administration  was  somewhat  limited,  there  is  a  fair  amount  of 
reasoned  speculation  about  the  requirements  and  in  devising  approaches  for  the  data  preparation. 

Prior  to  the  cessation  of  Polar  Guardian,  much  of  the  SNA  effort  was  devoted  to  finding  and 
liaising  with  the  right  people  to  obtain  information  required  to  prepare  the  data,  devising  plausible 
methods  to  do  so  in  the  presence  of  the  challenges  and  constraints,  and  vetting  candidate  tools. 
After  Polar  Guardian’s  cessation,  most  of  the  effort  was  devoted  to  studying  sample  email  log 
files  and  Global  Address  List  (GAL)  data  to  flesh  out  the  ideas  for  in-house  methods  for  user 
identification  and  compilation  of  data  for  input  into  the  SNA. 

1.1  Motivation  for  the  SNA 

This  section  reviews  the  history  that  culminated  in  the  launching  of  the  SNA. 

CFEC's  C&S  team  was  interested  in  determining  the  ‘As-Is’  capability  to  maintain  SA  in  the 
arctic,  identifying  shortcomings  in  this  capability,  and  modelling  ‘To-Be’  deployment  of  assets 
and  doctrine  as  a  remedy.  From  discussion  with  C&S,  it  was  determined  that  the  need  for  SA 
arises  from  the  following,  some  of  which  are  mentioned  in  [1]  and  [2]: 

•  The  concern  has  been  expressed  that  many  persons/vehicles  enter  the  Canadian  North 
undetected. 

•  The  annual  periods  during  which  the  Northwest  Passage  will  be  navigable  are  expected  to 
lengthen  in  coming  years. 


5  CFEC  has  since  reorganized  into  different  teams. 

6  As  opposed  to  ends-oriented.  Much  of  the  effort  was  in  establishing  a  means  to  acquire  data  for  SNA 
rather  than  the  SNA  analysis  itself. 
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•  A  means  is  needed  to  detect  terrorism  or  industrial  accidents  that  result  in  ecological  crisis 
(e.g.  pertaining  to  shipping,  pipelines,  or  oil  pollution),  to  respond  remedially,  and  to 
identify  and  prosecute  those  responsible. 

•  Organized  crime  is  attracted  to  the  arctic  diamond  trade  for  the  purposes  of  money 
laundering  and  manipulation  of  output  diamond  quality. 

•  Drugs  and  human  trafficking  are  currently  commonplace  in  the  North. 

•  There  are  territorial  disagreements  with  other  nations,  e.g.  the  U.S.  and  Denmark.  Arctic 
countries  are  mapping  out  their  continental  shelves,  since  can  this  can  potentially  support 
their  offshore  territorial  claims. 

•  Search  and  rescue  missions  are  conducted  by  the  Department  of  National  Defence’s 
(DND’s)  Joint  Task  Force  North  (JTFN).  The  Royal  Canadian  Mounted  Police  (RCMP)  has 
the  official  responsibility,  but  often  not  the  capability.  Public  Safety  and  Emergency 
Preparedness  Canada  (PSEPC)  is  the  main  organization  for  health/safety/emergencies.  It  is 
usually  supported  by  DND  and  the  RCMP.  DND  also  supports  local  authorities,  but  plays  a 
more  active  role  in  arctic  regions. 

Project  Polar  Guardian's  original  emphasis  was  on  modelling  the  capabilities  relevant  to  SA  in 
order  to  optimize  deployment  of  surveillance  technology  and  surveillance  practices.  Information 
sharing  with  other  government  departments  (OGDs)  and  industry  was  considered  an  important 
part  of  this  because  of  the  vast  expanse  of  arctic  land,  the  sparse  population,  sparse  assets, 
sparse/infrequent  surveillance,  and  the  fact  that  DND  is  not  the  only  Department  that  operates  in 
the  North. 

After  investigating  possible  approaches  to  modelling  the  sensor  coverage  and  information 
sharing,  EXORT  members  suggested  separating  the  modelling  for  the  two  aspects.  C&S  opted  to 
focus  first  on  information  sharing  between  agencies,  due  to  discussions  at  the  Arctic  Surveillance 
Interdepartmental  Working  Group  (ASIWG)  about  how  it  would  dramatically  improve  SA  in  the 
immediate  term.  The  aim  would  be  to  determine  whether  the  right  people  become  aware  of 
relevant  sightings  and  reports  under  various  scenarios. 

A  major  challenge  was  anticipated  in  modelling  the  ‘As-Is’  information  sharing  based  on  formal 
reporting  procedures  —  it  was  not  known  how  rigorously  standard  operating  procedures  (SOPs) 
are  followed.  The  following  paragraphs  elaborate  on  two  reasons  for  this.  The  first  is  that,  in 
contrast  to  intra-military  information  flow,  the  rigor  and  uniformity  of  training  in  OGDs  and 
industry  to  follow  SOPs  for  sharing  information  is  unknown,  as  is  the  extent  to  which  such  formal 
procedures  exist.  This  variability  or  uncertainty  in  following  SOPs  may  be  amplified  by  cultural 
differences  in  the  arctic,  and  is  compounded  by  the  second  reason:  understaffmg  and  the 
possibility  of  a  more  casual  attitude  toward  procedures  for  prompt  reporting  of  information.  For 
these  reasons,  it  would  not  be  unlikely  for  information  to  be  shared  along  informal  lines  of 
communication. 

Regarding  the  first  reason,  OGDs  and  industry  operate  in  different  environments  and 
circumstances  from  the  military.  Training  in  steadfastly  following  doctrinal  procedures  cannot  be 
expected  to  be  uniform  across  organizations  with  cultures  that  can  be  very  different.  To  be  sure, 
departure  from  doctrine  is  not  necessarily  bad;  in  fact,  it  may  be  seen  as  locally  adaptive  to 
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circumstances,  and  potentially  an  optimization  of  practice.  However,  it  introduces  a  large 
unknown  in  a  model  of  information  sharing  based  on  SOPs. 

The  second  reason  for  possible  disparity  between  SOPs  and  actual  information  dissemination 
came  from  EET,  whose  visit  to  the  Arctic  for  ASWIG  revealed  that  regulatory  offices  can  be 
understaffed.  This  can  compromise  the  rigor  with  which  procedures  (particularly  administrative 
paperwork)  are  followed.  Conversely,  the  remoteness  and  small  community  size  can  result  in 
government  offices  being  in  close  proximity,  thus  increasing  the  likelihood  of  informal 
information  sharing. 

To  get  a  better  idea  of  if  and  where  informal  information  sharing  might  play  a  significant  role,  a 
social  network  analysis  (SNA)  was  proposed.  This  involves  gathering  data  to  show  the  linkages 
within  a  group  of  offices/people,  notionally  represented  as  nodes  in  a  network,  also  known  as  an 
SNA  graph1 .  The  data  can  be  statistics  on  volume  of  communications  between  nodes,  or  surveys 
to  educe  relationships  of  various  types  between  nodes.  Analysis  of  the  resulting  networks  can 
reveal  cliques8  of,  and  barriers  to,  information  sharing.  It  can  also  reveal  central  nodes  that  are 
either  bottlenecks  or  facilitators  of  information  sharing,  and  individuals  that  are  key  to  bridging 
any  cliques.  The  SNA  would  augment  the  model  of  formal  reporting  pathways,  if  it  did  not 
indicate  other  approaches  to  modelling  as  preferable.  The  SNA  would  also  be  a  test  of  its  utility 
in  understanding  the  flow  of  information  pertaining  to  arctic  SA.  Books  and  articles  on  SNA  are 
provided  in  the  bibliography,  while  Annex  A  provides  an  introduction  to  concepts  and  typical 
analyses  within  the  context  of  the  popular,  free  SNA  software  ‘Pajek’. 

The  SNA  would  first  be  performed  on  DND  communications.  The  results  would  then  be  used  to 
encourage  the  involvement  of  OGDs  and  industry. 

Analyses  that  go  beyond  SNA  were  also  considered,  such  as  tracking  the  timing  of  information 
diffusion9  e.g.  by  subject. 


1.2  Data  Gathering  Strategy 

Because  people  do  not  always  behave  in  the  way  that  they  report  on  a  survey,  the  data  for  SNA 
was  to  be  gathered  from  both  communications  statistics  and  surveys.  Reference  [3]  provides  a 
template  for  building  an  SNA  survey  and  conducting  the  analysis.  Most  of  the  effort  in  Polar 
Guardian’s  SNA  was  devoted  to  arranging  the  acquisition  of  email  log  data  and  devising 
approaches  for  its  preparation;  the  survey  would  serve  as  a  follow-up  phase  to  the  SNA  of 
electronic  communications.  The  aim  was  to  have  a  social  network  graph  and  analysis  done  in 
time  for  the  fall  ASIWG  meeting  so  that  it  could  be  used  to  educate  the  participants  about  SNA. 
Social  network  analysts  have  found  that  this  can  be  quite  engaging  for  participants;  this  could 
used  to  encourage  a  high  return  rate  for  the  survey. 


7  Refer  to  Annex  A  for  background  and  example. 

8  “Clique”  has  a  rigorous  mathematical  definition,  but  is  used  here  in  the  general  sense. 

9  This  was  at  a  discussion  level,  and  the  meaning  of  diffusion  was  quite  open  at  the  time  e.g.  not  only  as  an 
email  propagated  to  its  recipients,  but  also  who  the  recipients  replied  to  or  forwarded  it  on  to. 
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While  follow-up  contact  with  those  being  surveyed  improves  the  return  rate,  it  has  been  found 
that  offering  gifts  for  returned  surveys  dramatically  increases  the  return  rate  [3].  The  apparent 
banality  of  this  suggestion  belies  its  importance.  A  past  survey  conducted  at  ASIWG  for  a 
different  but  related  purpose  had  a  very  low  return  rate.  In  conducting  a  survey,  it  is  generally  not 
acceptable  to  repeat  the  survey  and  ask  those  surveyed  to  fill  out  a  form  again,  simply  because  the 
returned  data  fell  short  of  expectations  due  to  comers  cut.  As  there  is  no  second  chance  to  make  a 
first  impression,  one  way  to  avoid  irreparable  shortcomings  and  maximize  the  data  return  is  to 
avoid  being  too  economical  with  the  gift  e.g.  a  nice  pen  rather  than  an  economy  pen.  This  is 
especially  true  for  an  SNA,  due  to  the  interconnected  nature  of  network.  Missing  links  between 
nodes  do  not  simply  give  you  less  data;  they  can  affect  the  relevant  patterns  in  the  network  and 
the  conclusions  in  nontrivial  ways.  Though  the  culture  of  the  organization  in  question  will 
determine  how  acceptable  it  is  to  use  gifts  as  an  incentive,  the  entire  cost  of  conducting  the  SNA 
should  be  kept  in  mind,  as  should  the  price  of  compromising  the  accuracy  with  limited  data10. 

The  candidate  types  of  information  on  electronic  communication  identified  for  this  SNA  were 
traffic  volumes  for  email  and  phone  calls.  The  feasibility  of  obtaining  the  data  to  compile  this 
information,  and  the  adequacy  of  the  data  in  the  records,  was  to  be  explored.  Email  was  to  be 
attempted  first.  As  it  turns  out,  this  had  enough  challenges  that  it  became  the  sole  focus. 

The  duration  over  which  statistics  would  be  compiled  would  be  weeks  or  months,  to  establish  a 
steady-state  traffic  pattern 1 1 .  Statistics  would  then  be  compiled  for  the  duration  following  a 
significant  event  of  interest,  to  see  whether  they  differed  noticeably  from  steady-state. 


10  The  use  of  rewards  to  improve  survey  return  rates  is  standard  practice.  Further  research  is  needed, 
however,  to  distinguish  between  its  benefits  in  a  public  setting  versus  a  corporate  setting,  and  in 
government  specifically.  Discussion  with  SMEs  in  DND  personnel  did  not  reveal  a  history  of  rewards 
being  regularly  used  for  surveys  within  DND  specifically,  and  CF  members  are  not  permitted  to  receive 
such  rewards.  If  hinterland  offices  are  indeed  regularly  and  severely  understaffed,  it  may  be  difficult  to 
conceive  of  a  reward  that  effectively  motivates  the  staff  to  give  priority  to  a  survey.  Practices  to  improve 
survey  return  rates  require  further  research  (reference  [3]  briefly  mentions  alternatives). 

1 1  This  is  only  to  a  “first  order”.  There  may  be  an  annual  or  monthly  seasonality  to  the  data  pattern.  The 
SNA  may  reveal  any  such  periodicity,  as  well  as  the  prospects  of  taking  such  periodicity  into  account  in 
“baselining”  the  traffic. 
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2  Challenges  to  Acquisition  and  Preparation  of  Data 
on  Email  Traffic  Volume 


This  chapter  describes  the  challenges  encountered  in  obtaining  email  traffic  volume  data  for  Polar 
Guardian.  In  the  course  of  attempting  to  resolve  these  challenges,  participants  in  a  past  SNA  for 
Multi-National  Experiment  4  (MNE  4)  [4] [5]  were  consulted.  From  these  discussions,  it  seemed 
that  many  of  the  challenges  arose  from  the  fact  that  the  statistics  were  being  compiled  across 
DND’s  coiporate  email  rather  than  for  communications  in  a  smaller,  more  dedicated  collaborative 
environment  e.g.  such  as  for  an  experimental  scenario,  where  the  analysts  may  have  more  control 
over  the  infrastructure  or  more  access  to  those  responsible,  files  are  smaller,  there  might  be  only 
one  mail  server,  fewer  administrative  and  policy  barriers,  and  more  direct  access  to  subject  matter 
experts  (SMEs).  This  report  will  sometimes  refer  to  these  challenges  as  constraints,  since  a 
possible  way  forward  may  entail  working  within  restrictions  (technical  or  nontechnical)  rather 
than  overcoming  or  removing  them.  This  generally  translates  into  more  work,  more  complicated 
solutions,  or  less  confidence  in  the  resulting  data. 

Some  of  the  constraints  might  not  apply  to  future  efforts,  depending  on  the  authority  behind  the 
request  for  data,  and  possibly  with  much  additional  time  for  administrative  pursuit.  The  time 
frame  for  Polar  Guardian  was  to  obtain  example  results  to  bring  to  the  fall  ASIWG  meeting.  Not 
only  would  that  provide  a  check  of  the  methodology,  but  it  would  also  have  motivated  discussion 
and  hopefully  generated  buy-in  at  ASIWG. 

The  challenges  are: 

1.  Only  the  Exchange  Server®  tracking  log  files  would  be  provided.  No  other  data  would  be 
generated  or  provided  by  the  IT  administrative  personnel. 

2.  No  pre-processing  or  filtering  of  log  files  by  their  providers,  Director  Information 
Management  Engineering  and  Integration  (DIMEI). 

Together  with  discussions  with  an  executor  of  a  past  project  involving  compilation  of  email 
statistics,  these  lead  to  the  following  assumed  constraints. 

3.  No  access  to  email  servers. 

4.  No  applications  of  any  kind  (commercial  or  homemade)  would  be  deployed  on  the 
Exchange  servers12  to  filter  or  processing13  the  email  traffic  volume  data,  or  collect  it  in  any 
way  e.g.  by  sending  distilled  statistics  to  an  SQL  server. 

The  above  constraints  made  it  necessary  to  work  with  the  raw  tracking  log  files,  which  lead  to  the 
following  challenges. 


12  A  specific  deployment  of  Exchange  Server®  and/or  the  hosting  machine  is  referred  to  as  an  Exchange 
server  (or  simply  “server”). 

13  Any  manner  of  gathering  data  at  the  servers,  pre-processing  the  data,  or  reconstituting  it  in  any  way 
required. 
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5.  No  straightforward  scripted/automated  access  to  directory  services  on  the  Defence  Wide 
Area  Network  (DWAN)  for  email  user  identification,  if  required14.  Note  that  this  may  in 
fact  be  possible,  with  the  right  expertise,  or  a  similar  functionality  could  be  improvised.  For 
example,  it  was  found  after  Polar  Guardian  was  halted  that  the  DWAN’s  user  lookup 
database  (the  Global  Address  List,  or  GAL)  could  be  downloaded  as  a  comma-separated- 
values  (CSV)  file.  This  file  can  be  imported  into  a  database  that  can  be  locally  queried  to 
complete  partial  identities  from  the  log  files.  Scripting  languages  can  implement  similar 
functionality  by  reading  the  CSV  data  into  a  lookup  table.  However,  the  speed  implications 
and  applications  development  time  would  have  had  to  be  more  fully  explored  (Section 

4.5. 1.2). 

It  was  not  initially  apparent  that  the  DWAN’s  directory  services  were  needed  to  identify  email 
sender/recipients.  Early  communications  with  DIMEI  indicated  that  email  headers  were 
contained  in  the  Exchange  Server®  tracking  log  files.  In  IT,  email  headers  typically  refer  to 
specific  fields  of  information,  including  complete  sender  and  recipient  IDs,  formatted  according 
to  the  world-wide  standard  RFC  2822  (a.k.a.  “RFC2822”)  [6],  When  example  log  files  were 
obtained  midway  through  the  data  acquisition  effort,  it  became  clear  that  “headers”  was 
interpreted  in  a  much  more  general  sense,  and  did  not  contain  the  required  data.  The  additional 
step  of  identifying  sender/recipient  needed  exploration. 

6.  Large  files  handling.  This  must  be  kept  in  mind  when  planning  how  and  where  to  store  and 
process  the  data  to  compile  the  email  traffic  statistics.  Regardless  of  whether  the  logs  are  first 
filtered  before  stats  are  compiled,  and  whether  a  database  is  used,  the  front  end  of  the  data 
preparation  needs  to  be  able  to  handle  large  volumes  of  data.  The  initial  figure  provided  by 
DIMEI  was  2.5GB  of  logs  per  day,  and  it  was  envisioned  that  the  SNA  would  use  data  over  a 
period  of  weeks  or  months.  Reduced  requirements  are  estimated  in  Section  3. 

7.  Email  server  version  clarification.  The  initial  information  described  the  log  files  as  both 
versions  Exchange  Server®  5.5  and  2003.  Throughout  the  duration  of  the  SNA  effort,  it  was 
thought  that  a  major  difference  between  them  was  that  the  sender/recipient  email  address  is 
readily  apparent  for  the  latter,  but  not  in  many  of  the  transaction  records  for  the  former. 
Accommodations  had  to  be  made  to  process  both  formats  unless/until  further  information  was 
obtained  indicating  that  only  one  of  the  two  formats  had  to  be  supported.  This  in  fact 
happened  midway  through  the  SNA  effort.  Discussions  with  DIMEI  personnel  indicated  that 
Ottawa  servers  were  the  seemingly  more  cryptic  5.5  version.  DIMEI  is  in  the  process  of 
migrating  to  Exchange  Server®  2003,  however,  and  the  transition  would  be  completed 
toward  the  end  of  2006. 

8.  Question  of  deducing  email  sender/recipient  ID.  It  was  not  clear  what  was  required  to 
convert  the  data  in  the  5.5  logs  into  unique  sender/recipient  IDs15,  or  whether  it  was  even 
possible.  Conflicting  technical  information  came  from  different  sources  (DIMEI,  tool 
vendors). 


14  In  this  report,  a  number  of  approaches  to  identifying  senders/recipients  are  considered,  not  all  of  which 
require  scripted/automated  access  to  directory  services.  It  would  be  advisable  to  consult  responsible 
experts  to  ensure  that  the  manner  in  which  such  services  are  used  for  an  SNA  are  legal  and  ethical. 

15  The  question  of  anonymizing  user  identity  data  had  not  yet  been  broached  in  this  effort,  so 
constraint/challenge#8  refers  to  identification  of  a  sender/recipient  regardless  of  whether  the 
sender/recipient  is  anonymized. 


6 


DRDC  CORA  TM  2009-030 


9.  Need  for  information  on  Exchange  Server®  5.5  log  files.  The  5.5  log  files  are  in  a 
proprietary  Microsoft™  format.  Tables  are  available  describing  the  fields  in  a  general  sense, 
but  not  sufficiently  to  decipher  the  code  within  the  fields.  According  to  Microsoft™  support, 
Exchange  Server®  5.5  is  archaic,  as  is  the  log  file  format,  and  the  people  familiar  with  the  log 
file  have  moved  on  many  years  ago.  From  the  tone  of  the  conversation,  it  seemed  that  5.5 
was  an  ad-hoc  transitional  format  in  the  evolution  of  the  server  software. 

10.  Legality.  DIME1  had  determined  that  the  SNA  and/or  the  acquisition  of  the  email  server  log 
files  potentially  constituted  monitoring.  They  required  assurance  from  legal  advisers  that  the 
endeavour  did  not  violate  ethical  or  privacy  policies. 

As  an  example  of  past  efforts  in  which  not  all  the  above  constraints  were  present  (or  in  which 
some  were  dealt  with),  DIMEI  conducted  studies  in  2004  and  2006  that  used  email  server  log 
files  to  analyze  traffic  at  a  server-to-server  level16.  (In  contrast,  the  SNA  requires  identification 
of  senders/recipients  down  to  the  user  level).  The  goal  was  to  use  the  traffic  data  as  input  to  their 
OpNet  network  simulation  tool.  A  number  of  people  were  involved,  including  a  Microsoft™ 
Exchange  expert.  Order-of-magnitude  time  frames  for  the  data  acquisition  were  as  follows. 

•  The  first  approach  involved  several  weeks  of  scripting  to  extract  the  email  traffic  data  from 
the  log  files. 

•  The  second  approach  involved  two  months  of  approval  seeking  to  deploy  custom 
applications  onto  the  Exchange  servers,  which  extracted  distilled  email  traffic  data  on  a 
periodic  basis  and  sent  it  to  a  separate  database. 

The  second  approach  avoided  the  need  to  process  extremely  large  volumes  of  email  server  log 
data17,  but  required  significant  lead  time  for  approval,  as  well  as  Exchange  Server®  expertise. 


16  Mr.  Donald  Messier  (formerly  Major)  participated  in  a  feasibility  assessment  for  centralizing  the 
messaging  infrastructure.  A  report  “Common  E-mail  Centralization  Study  22  Dec  2004”  was  produced 
DIMEI  3-4  for  internal  DIMEI  use.  Currently,  the  DIMEI  subgroups  have  been  consolidated  into  a  single 
DIMEI  group.  Refer  to  Annex  Section  J.  1  for  current  contact  details. 

17  It  was  not  established  whether  the  log  files  used  in  the  DIMEI  study  were  the  same  as  the  tracking  log 
files  that  would  be  provided  for  this  SNA,  nor  what  order  of  magnitude  were  the  log  file  sizes. 
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3  Data  Acquisition  and  Preparation:  Requirements 
and  Considerations 


Initially,  the  plan  was  to  test  the  use  of  the  SNA  on  email  within  CFEC,  and  then  to  expand  the 
method  to  look  at  traffic  between  DND  mailboxes  relevant  to  arctic  SA.  In  consultation  with 
C&S,  it  was  decided  that  such  mailboxes  would  be  within  JTFN  and  Canada  Command1*'. 

The  value  of  this  phased  approach  became  questionable  as  more  information  was  gathered  about 
the  organization  of  the  servers. 

1 .  Pending  legal  approval,  DIMEI  was  willing  to  provide  any  Exchange  Server®  log  files 
requested.  There  was  no  requirement  that  the  value  of  an  SNA  be  shown  on  local  email 
traffic  before  the  log  files  of  other  servers  were  provided. 

2.  From  discussion  with  DIMEI19,  it  became  clear  that  the  initially  estimated  2.5GB  of  daily  log 
data  assumed  that  EXORT  was  provided  with  the  tracking  logs  for  all  email  throughout  all  of 
DND.  In  fact,  mailboxes  are  divided  among  approximately  100-150  Exchange  servers,  each 
handling  in  the  order  of  1 000  mailboxes  and  generating  a  daily  log  file  of  in  the  order  of 
25MB.  From  later  discussion  with  CFEC’s  IT  department  (Synthetic  Environment  And 
Modelling  &  Simulation  Team,  or  SEAMS  Team),  it  seemed  highly  likely  that  users  were 
assigned  to  servers  based  roughly  on  location  (geographical  and/or  on  the  organizational 
chart). 

With  the  above  knowledge,  the  focus  shifted  away  from  establishing  and  demonstrating  an  SNA 
approach  based  on  the  small  amount  of  email  within  CFEC.  The  main  problem  seemed  to  be 
determining  how  many,  and  which,  servers  were  of  interest  for  the  DND  traffic.  This  affects  the 
volume  of  data  that  had  to  be  handled,  and  strongly  determines  whether  any  devised  approach  is 
practical. 

Just  as  important,  however,  is  ensuring  that  the  method  demonstrated  on  DND  email  could  be 
feasibly  expanded  for  interagency  email.  Again,  this  moves  the  volume  of  log  data  to  another 
level.  The  issue  is  compounded  by  the  fact  that  email  will  be  flowing  between  multiple  domains, 
the  implications  of  which  have  not  been  adequately  investigated  in  this  project.  With  regard  to 
data  volume,  based  on  their  involvement  in  ASIWG,  C&S  estimated  the  number  of  relevant 
agencies  to  be  approximately  thirty.  This  is  confirmed  in  Annex  C,  where  organizations 
attending  ASIWG  2006  were  culled  from  the  minutes. 

For  the  purpose  of  analyzing  DND  email,  it  was  not  known  whether  JTFN  and  Canada  Command 
personnel’s  mailboxes  resided  together  on  one  server.  A  server’s  capacity  might  allow  for  such  a 
grouping,  but  EXORT  lacked  visibility  into  the  actual  groupings.  It  was  prudent,  therefore,  to 
make  allowances  for  the  division  of  personnel  of  interest  among  more  than  one  server. 


IS  To  be  rigorous,  the  intra-DND  SNA  should  also  include  JTFP  and  JTFA  because  of  their  maritime 
component.  However,  this  was  just  a  trial  to  establish  the  SNA  process.  The  final  aim  was  to  have  an 
extra-DND  SNA  that  includes  OGDs 

19  This  includes  discussions  with  Mr.  Donald  Messier  (footnote  16,  p.  6)  about  past  studies,  as  well  as  with 
operational  personnel  in  DIMEI  about  getting  server  log  data. 
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3.1  Intra-DND  Considerations 


Together  with  an  estimate  of  the  number  of  relevant  server,  the  approximate  log  file  size  of 
25MB/server/day  mentioned  above  can  be  used  for  an  order-of-magnitude  estimate  of  the  volume 
of  the  log  files  to  be  processed.  The  estimate  of  two  servers  each  for  JTFN  and  Canada 
Command  was  initially  used;  later,  more  margin  was  given  by  assuming  five  to  seven  servers 
total.  The  only  way  to  be  sure  about  the  server  count  was  to  collect  the  names  of  personnel 
relevant  to  arctic  awareness  and  look  up  the  servers  that  they  reside  on.  C&S  had  initiated  an 
inquiry  to  get  a  list  of  such  names,  and  the  information  was  forthcoming.  In  the  final  interagency 
phase,  however,  the  one-to-two-fold  margin  for  server  counts  within  DND  would  be  of  less 
significance  because  the  possible  inclusion  of  approximately  thirty  agencies  was  anticipated. 

3.2  Interagency  Considerations 

For  the  interagency  SNA,  the  assumption  of  one  server  per  non-DND  agency  was  considered  to 
be  overly  optimistic  (in  terms  of  simplicity).  Since  three  servers  for  each  of  JTFN  and  Canada 
Command  was  assumed,  however,  and  DND  likely  has  more  resources  than  most  agencies,  the 
estimate  of  two  servers  for  each  of  the  thirty  agencies  seemed  to  be  a  reasonable  starting  point. 

3.3  One-to-Many  Emails 

The  edges  on  an  SNA  graph  (e.g.  Figure  A-2  and  Figure  A-3  on  p.43)  represent  person-to-person 
communications.  The  obvious  way  to  handle  a  multi-recipient  email  with  A  recipients  is  to  treat 
it  as  A  “artificial”  one-to-one  emails.  It  may  be  more  accurate,  however,  to  count  each  artificial 
one-to-one  email  as  less  than  a  real  single-recipient  email.  One  need  only  consider  the  less 
relevant  broadcast  emails  that  one  receives  to  realize  that  a  single  email  sent  simultaneously  to 
A=20  recipients  is  unlikely  to  be  worth  as  much  as  twenty  different  emails  sent  individually  to 
different  people,  at  least  in  terms  of  communication  that  is  indicative  of  a  close  relationship  by 
which  SA  information  might  be  informally  shared.  Intuitively,  the  broader  the  audience,  the  less 
of  a  close  relationship  that  the  broadcaster  has  with  each  individual  (at  least,  as  indicated  by  that 
particular  email).  In  the  extreme  case,  a  message  posted  to  a  completely  public  forum  says  very 
little  about  the  relationship  between  the  author  and  all  the  possible  readers. 

The  reduced  count  value  of  an  artificial  one-to-one  email  can  be  thought  of  as  a  weighting  factor 
that  attenuates  the  default  count  value  of  1  for  each  email  message  in  general.  The  weighting 
factor  is  an  open  question,  but  one  expects  a  total  weight  for  all  recipients  to  increase 
sublinearly20  with  the  number  of  recipients,  A.  For  example,  one  could  weight  the  total 

communications  Ax0t  for  an  A-recipient  email  as  a  function  of  A  according  to  NTot  (A)  =  Va  , 
yielding  a  per-recipient  weight  ofl/  Va  . 


20  There  are  rigorous  ways  to  define  sublinearity,  but  here  it  refers  to  the  behaviour  of  a  single-input,  single¬ 
output  function  in  the  upper-right  quadrant  of  a  Cartesian  graph  (the  only  region  of  interest).  The  goal  of 
the  function  is  to  represent  diminishing  returns,  so  the  slope  is  always  positive,  and  always  diminishes  as 
the  independent  variable  increases. 
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Since  the  analyst’s  judgement  is  needed  on  the  specific  weighting  scheme,  it  is  illuminating  to 
consider  different  weighting  examples  that  can  be  devised.  As  an  alternative  to  the  unbounded 

example  NTot(N )  =  sfN  ,  consider  the  case  in  which  one  regards  emails  with  recipient  lists 

longer  than  some  threshold  (say  /Vo=30  recipients)  as  contributing  no  additional  information  about 
relationships  between  individuals.  One  possible  scheme  may  be  to  have  the  total  weight  Njot(N) 
for  all  N  recipients  defined  in  so  that  it  is  asymptotically  bounded  by  the  lesser  of  Njol(F)=N  and 
Nj0t(N)=No,  with  a  soft  transition  where  the  two  cross  over  (Figure  1).  The  line  Njot(N)=N  is  what 
an  f'V-rccipicnt  email  would  be  worth  if  it  was  considered  the  same  as  N  single-recipient  emails. 
The  diminishing  returns  of  the  actual  Njot(N)  (solid  line)  expresses  the  analyst’s  decision  that 
emails  to  more  than  N=No  begin  to  enter  the  regime  of  mass  broadcasts  and  do  not  provide  much 
information  about  individual-to-individual  closeness.  Annex  B  provides  formulas  that  can  be 
used  for  such  soft-limiting  dependence  on  N. 


Figure  1:  Example  of  sublinear  function  Njot(N)  defined  as  the  lesser  ofNTot(N)=Nand 
Niot(N)=No,  with  a  soft  transition  where  the  two  cross  over 


Developing  algebraic  formulae  with  which  to  conveniently  weight  one-to-many  emails  raises  the 
question  of  email  discussions  among  groups  of  people.  Rigorous  study  of  how  these  discussions 
manifest  themselves  in  an  SNA  lies  outside  the  scope  of  this  report,  though  thoughts  are  put  forth 
for  consideration.  First,  outright  spam  is  not  part  of  SNA  on  sharing  on  SA  (though  it  might  be 
serve  other  purposes  related  to  IT  security).  That  is,  the  SNA  is  envisioned  to  consider  situations 
in  which  there  is  a  degree  of  professionalism  and  sufficient  coiporate  controls/oversight  to 
prevent  flagrant  abuse  of  email,  and  countermeasures  against  inadvertent  spamming  e.g.,  through 
computer  infection.  Hence,  one-to-many  emails  are  sent  only  to  groups  of  people  for  good 
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professional  reason21 .  These  groups  of  interest  (GOIs)  can  be  so  designated  in  order  to 
distinguish  them  from  the  popular  notion  of  communities  of  interest  (COIs);  the  latter  implies  a 
persistent  group  and  doesn’t  suitably  describe  ephemeral  or  one-time  issues.  Defined  in  this  way, 
GOIs  are  a  superset  of  COIs. 

Note  that  sublinearly  weighting  one-to-many  emails  does  not  preclude  the  manifestation  of  GOIs 
in  an  SNA,  since  the  weighting  is  merely  a  preprocessing  step  applied  to  the  data.  Cliques22  can 
still  be  identified  within  an  SNA  graph  using  whatever  criteria  or  means  that  may  be  of  interest  in 
the  absence  of  sublinear  weighting.  Sublinear  weighting  is  merely  a  way  to  have  broadcast 
emails  to  (say)  N=  20  recipients  not  treated  identically  as  20  individual  emails,  since  the  latter 
typically  (though  not  always)  reflects  more  time  and  effort  invested  by  the  sender.  At  the  level  of 
a  GOI  rather  than  an  individual  recipient,  this  reflects  the  fact  that  it  is  quite  likely  for  a  broadcast 
to  be  of  unequal  importance  (or  even  relevance)  to  all  N= 20  people.  If  the  weighting  scheme 
yields  Aiot(20)=13,  for  example,  this  still  yields  a  significant  portion  of  the  broadcast 
communication  on  the  part  of  the  sender.  Each  recipient  only  registers  0.65  units  of 
communication  in  this  weighting  example,  but  if  there  is  indeed  active  communications  so  as  to 
warrant  recognition  as  a  GOI,  the  partial  units  will  accumulate.  Furthermore  the  active 
disseminators  will  accrue  significant  weighting  on  their  outgoing  communications  to  the  GOI. 

The  following  requirements  follow  from  the  considerations  and  estimates  above  regarding  data 
volume,  intra-DND  email,  and  interagency  email. 

3.4  General  Requirements  and  Estimates 

In  the  following  discussion,  it  is  assumed  that  50  individuals  or  offices  are  identified  to  be  of 
interest  of  the  SNA  based  on  email.  The  quantity  of  fifty  mailboxes  was  arrived  at  based  on  the 
aim  of  overestimating  the  count  beyond  any  number  that  can  be  reasonably  expected.  This 
approach  was  taken  because,  in  consultation  with  C&S,  the  actual  count  of  reporting  individuals 
would  not  likely  be  obtainable  within  a  short  period  of  time.  Gross  over-estimation  is  justified 
because  the  potentially  prohibitive  volume  of  front-end  data  is  due  to  the  number  of  servers  over 
which  the  users  of  interest  (hereafter  referred  to  as  interesting  users)  are  distributed  more  than  the 
actual  number  of  interesting  users  (which  is  not  a  very  formidable  quantity)23. 

1.  In  the  pilot  stage,  the  SNA  will  include  up  to  fifty  mailboxes  within  DND,  distributed  across 
five-to-seven  servers,  each  serving  approximately  1000  mailboxes.  All  servers  will  be  in  the 
same  domain,  and  will  consist  of  Exchange  Server®  5.5  servers  and  Exchange  Server®  2003 
servers. 

2.  If  it  gets  to  the  final  stage,  the  SNA  will  cover  up  to  100  mailboxes  distributed  over 
approximately  sixty  servers,  not  in  the  same  domain,  and  not  in  the  same  corporation24. 


21  This  is  admittedly  open  to  interpretation,  but  again,  a  rigorous  conceptual  framework  better  belongs  in  a 
separate  study  that  goes  beyond  data  gathering  and  preprocessing. 

22  As  on  page  3,  “clique”  has  a  rigorous  mathematical  definition,  but  is  used  here  in  the  general  sense. 

23  In  other  words,  the  entire  (large)  log  file  for  a  server  has  to  be  processed  regardless  of  how  many 
interesting  users  reside  on  that  server. 

24  The  different  corporations  implies  a  host  of  uncertainties.  The  Exchange  Server 
versions/configuration/operation,  users  per  server,  and  log  file  sizes  may  differ  from  DND’s,  if  in  fact 
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3.  The  selected  SNA  software  (Pajek,  Annex  A)  requires  data  consisting  of  records,  each 
describing  the  volume  of  email  between  two  mailboxes.  Hence,  the  processing  of  email  data 
should  generate  data  of  the  following  form: 

MailboxA  MailboxB  VolumeofEmail 

MailboxA  MailboxC  Volume  of  Email 

...  etc.  ... 

MailboxB  MailboxA  Volume  of  Email 
MailboxB  MailboxD  Volume  of  Email 
...  etc.  ... 

This  requirement  on  the  general  form  of  input  data  is  fairly  generic  for  SNA  software. 

4.  It  is  preferable  that  the  data  acquisition  method  somehow  allow  for  the  reduced  weighting  of 
multi-recipient  emails  as  per  Section  3.3. 

5.  If  commercial  software  is  used  to  compile  the  input  data  for  the  SNA,  it  must  not  have  the 
requirement  of  being  deployed  on  to  the  Exchange  servers,  nor  access  to  Microsoft™  “active 
directory”  (AD)  directory  service  for  user  identification.  The  restriction  from  accessing  the 
AD  was  to  circumvent  the  long  lead  time  needed  to  install  applications  on  DWAN.  It  was 
discovered  after  the  cessation  of  Polar  Guardian,  however,  that  the  GAL  can  be  downloaded 
as  a  CSV  file.  It  is  not  necessarily  likely  that  commercial  software  will  be  able  to  use  it  in 
that  form,  nor  is  it  clear  that  the  GAL  data  is  sufficient  to  identify  users  from  tracking  log 
data25;  the  GAL’s  usability  should  be  investigated  on  a  tool  by  tool  basis,  and  its  adequacy 
should  be  explored  if  there  is  a  suitable  tool  for  which  it  is  usable. 

3.5  Intra-DND  Email  :  Requirements  Estimates  and 
Considerations 

In  the  following,  the  mailbox  counts  are  order-of-magnitude  estimates  for  an  SNA  within  DND, 

with  the  aim  of  over-estimation,  as  discussed  in  Section  3.4. 

1.  Assume  five-to-seven  servers  hosting  mailboxes  of  interest  —  say,  six  servers,  each  serving 
1000  users  and  generating  a  25MB  log  file  per  day,  totalling  150MB/day 

2.  To  study  email  traffic  patterns  over  two  weeks  (for  example)  would  require  processing  2. 1GB 
of  log  file  data. 


Exchange  Server  is  even  used.  The  numbers  used  in  this  chapters  are  order-of-magnitude  estimates  based 
on  DND. 

25  The  ethics  and  legality  of  combining  GAL  data  with  log  files  would  also  need  to  be  investigated.  This 
just  a  specific  example  of  the  larger  question  of  whether  collecting  SNA  data  in  general  is  ethical  and  legal, 
which  remains  to  be  investigated.  The  answer  may  depend  on  the  details  of  how  the  data  is  gathered  and 
prepared  as  much  as  it  depends  the  raw  or  final  SNA  data  itself.  Section  4.6  elaborates  on  legal  compliance 
for  tracking  logs  specifically. 
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3.  Several  tens  of  mailboxes  (up  to  approximately  100)  were  expected  to  be  involved  in 
generating/sharing  incident  reports. 

4.  For  a  rough  estimate  of  the  number  of  traffic  volume  metrics  to  generate,  M=  50  mailboxes  of 
interest  can  be  paired  up  in  P  =  j  =  1225  ways.  Therefore,  up  to  1225  metrics  of  email 
volume  will  be  generated. 


5.  For  P  metrics  of  email  volume,  if  commercial  software  is  used  to  generate  the  metrics  from 
the  log  files,  it  should  ideally  not  require  P  queries,  since  the  log  file  data  is  quite 
voluminous.  For  example,  if  parsing  of  the  log  file  can  occur  at  lOOKB/second,  the 
aforementioned  2.1GB  would  take  approximately  six  hours.  The  actual  time  would  depend 
on  the  operation  of  the  software,  what  it  does  to  identify  records  involving  users  of  interest, 
and  how  much  of  that  is  done  during  parsing  versus  afterward.  In  any  case,  the  application 
should  be  well  developed  enough  that  only  one  pass  through  the  log  data  is  needed,  since 
most  of  the  data  will  not  be  relevant. 


3.6  Interagency  Email:  Requirements  Estimates  and 
Considerations 

In  the  following,  the  mailbox  counts  are  order-of-magnitude  estimates  for  mailboxes  of  interest 

both  within  DND  and  in  OGDs,  with  the  aim  of  over-estimation,  as  discussed  in  Section  3.4. 

1 .  Approximately  thirty  organizations  involved  in  arctic  security  (Annex  C),  and  which  might 
be  included  in  the  SNA. 

2.  Assume  two  servers  containing  mailboxes  of  interest  per  organization,  yielding  sixty  servers, 
each  assumed  to  server  1000  users  and  generating  25MB/day,  totalling  1.5GB/day26. 

3.  Over  a  two-week  period,  this  means  21GB  of  log  files.  This  doesn’t  account  for  the 
possibility  that  traffic  is  less  on  weekends,  but  again,  this  is  an  order-of-magnitude  estimation 
with  a  leaning  toward  over-estimation  to  provide  margin. 

4.  Several  tens  of  mailboxes  (up  to  approximately  100)  are  expected  to  be  involved  in 
generating/sharing  incident  reports. 

5.  Estimate:  M=100  mailboxes  requires  up  to  P  =  j  =  4950  metrics  of  email  volume. 

6.  Even  more  critically  than  for  intra-DND  email,  for  P  metrics  of  email  volume,  the  tool  used 
for  data  processing  should  not  require  P  queries  of  the  log  file  data,  since  the  data  set  is 
extremely  large.  (If  queries  are  done  on  a  database  of  records  pertaining  only  to  mailboxes  of 
interest,  however,  this  requirement  may  not  be  as  important). 

7.  The  servers  that  generate  the  log  files  are  not  on  the  same  domain. 


26  Order-of-magnitude  estimates  based  on  DND. 
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8.  If  commercial  software  is  used  to  compile  email  volume  metrics  from  tracking  logs,  it  should 
be  able  to  maintain  distinct  mailbox  identities,  even  though  the  servers  generating  the  log 
files  are  on  different  domains.27 


27  If  this  is  not  technically  possible,  then  alternatives  to  tracking  logs  need  to  be  investigated  for  generating 
SNA  data.  This  of  course  has  implications  for  the  feasibility  of  the  interagency  email  SNA  as  a  whole. 
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4  Addressing  Requirements  Estimates  and 
Considerations 


This  section  documents: 

1 .  The  selection  of  SNA  software 

2.  Vetting  of  commercial  software  for  data  gathering  and  preparation 

3.  Aspects  of  previous  attempts  within  DND  to  study  email  traffic  volume,  which  could  inform 
the  method  of  preparing  input  data  for  the  SNA 

4.  Selection  of  a  scripting  language  for  in-house  approaches  to  data  preparation 

5.  Knowledge  gleaned  from  examining  the  example  tracking  log  files,  on  which  approaches  can 
be  based  for  user  identification 

6.  Legal  issues  and  likelihood  of  their  resolution 

7.  The  suitable  computing  network  for  data  preparation  and  the  SNA 
These  points  are  discussed  in  sections  4.1  to  4.7,  respectively. 

4.1  SNA  Software 

Three  packages  that  are  among  the  most  popular  SNA  software  available  are  UCINET,  Pajek,  and 
NetDraw.  To  circumvent  the  administration  of  purchasing  software,  EXORT  opted  to  use 
freeware,  which  excludes  UCINET.  Pajek  and  NetDraw  both  seemed  well  documented,  so  the 
choice  of  which  to  start  with  was  somewhat  arbitrary.  Pajek  (  Annex  A)  had  the  benefit  of  having 
an  SNA  textbook  based  on  it  [7],  however,  so  it  was  chosen  as  the  package  with  which  to  start 
exploration  of  SNA. 

4.2  Data  Acquisition  Software 

The  initial  plan  was  to  write  scripts  to  preprocess  the  log  file  data  for  input  into  Pajek.  DIMEI’s 
advocacy  for  Quest®’ s  MessageStats™  tool  prompted  the  exploration  of  that  avenue  as  a 
possibly  quicker  solution.  MNE28  4’s  use  of  Excel®  for  their  SNA  prompted  the  examination  of 
Excel®-based  solutions.  Discussion  with  SEAMS  on  SQL  Server  requirements  generated  a 
series  of  potential  commercial  alternatives  to  MessageStats™  and/or  SQL  Server.  In  researching 
those  options,  further  commercial  alternatives  were  encountered.  These  investigations  are 
detailed  in  Annex  D. 


28  Multi-National  Experiment. 
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After  the  above  investigations,  the  initial  plan  to  process  the  data  through  scripting  seemed  to  be 
the  most  direct,  simple,  and  realizable.  It  had  the  least  amount  of  uncontrolled  dependencies  that 
could  render  the  solution  unusable  e.g.  dependence  on  software  vendors,  purchasing, 
authorization  for  installation,  and  installation.  Since  EXORT  would  only  have  access  to  the 
tracking  log  files,  however,  the  issue  of  unambiguously  identifying  sender  and  recipient  (with 
confidence)  needs  resolving  for  any  of  the  approaches  to  be  workable. 

The  following  is  a  summary  of  the  commercial  software  approaches  in  Annex  D,  investigated 
circa  September  2006.  The  accuracy  of  information  on  commercial  products  is  limited  to  the 
accuracy  with  which  the  information  was  provided  in  consultations  with  the  vendors. 

The  use  of  Excel  to  convert  the  raw  log  data  into  an  Access  database  was  limited  by  Excel's  low 
maximum  record  count  (65,536),  the  nontabular  nature  of  the  log  entries,  and  inadequate  user 
identification  data.  The  same  limitations  apply  to  Excel's  "PivotTable"  [8]  feature,  which  was 
used  to  compile  SNA  input  data  in  MNE  4. 

The  raw  log  data  can  be  reduced  in  volume  by  pre-filtering,  and  made  tabular  by  converting 
multi-recipient  email  entries  to  one-to-one  equivalents.  However,  this  pre-processing  phase  can 
also  be  made  to  compile  the  SNA  input  data  (Section  4.4),  thus  obviating  the  need  for  Excel, 
PivotTable,  and  Access.  Elnfortunately,  it  does  not  solve  the  problem  of  obscure  user 
identification  data. 

Of  the  remaining  potential  commercial  solutions,  the  most  investigated  was  Quest's 
MessageStats,  due  to  advocacy  by  DIMEI.  Despite  ample  communications,  however, 
demonstration  of  its  functional  suitability  (in  the  absence  of  the  Active  Directory)  and 
affordability  was  still  forthcoming,  and  it  was  not  clear  whether  EXORT  would  be  able  to  meet 
requirements  regarding  database  size  and  software. 

The  remaining  commercial  options  were  considered  less  promising  for  one  or  more  of  the 
following  reasons,  in  order  of  decreasing  insurmountability29. 

•  Lack  of  response 

•  Uncertain  functional  suitability  (proper  transformation  of  log  data  to  SNA  input  data) 

•  Access  to  Exchange  servers  required 

•  Access  to  Active  Directory  required 

•  Requirement  for  SQL  Server 

•  Cost 

4.3  Leveraging  DIMEI’s  Experience 

DIMEI30,  which  conducted  the  email  traffic  studies  described  at  the  end  of  Section  2,  raised  the 
possibility  of  leveraging  this  work  in  gathering  data  for  server-to-server  traffic.  After  further 


29  As  subjectively  viewed  by  the  primary  analyst  of  this  SNA. 

30  Discussion  with  Mr.  Messier  (Footnote  16,  page  6). 
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investigation,  however,  it  was  deemed  unsuitable  for  the  SNA,  which  requires  data  at  the 
resolution  of  individual  users.  The  possibility  was  also  raised  of  guidance  from  DIMEI  in 
navigating  and  expediting  the  process  of  approval  in  the  (unlikely)  case  that  EXORT  ended  up 
deploying  tools/applications  onto  the  Exchange  servers. 

4.4  Scripted  Data  Preparation  With  Perl 

If  it  is  assumed  that  tracking  log  files  are  the  only  starting  point,  it  seems  in  hindsight  that  time 
spent  exploring  potentially  more  elegant  commercial  solutions  to  data  preparation  might  have 
been  better  spent  on  the  initial  simple  approach  of  scripting  the  functionality,  low  level  though  it 
may  be.  Two  of  the  most  popular  languages  meant  specifically  for  data  processing  are  Perl  and 
Python.  Interpreters  for  both  are  free.  Perl  is  the  more  mature  of  the  two,  and  extensive 
knowledge  and  ramp-up  material  exists  in  the  public  domain.  Since  the  primary  analyst  of  this 
SNA  had  previous  exposure  to  Perl,  it  was  chosen  as  the  language  with  which  to  process  log  files 
into  Pajek  input  data.  Annex  E  contains  reference  material  for  Perl. 

4.5  Deciphering  Exchange  Server®  Tracking  Log  Files 

As  mentioned,  deciphering  the  user  IDs  in  the  tracking  log  files  is  necessary  because  of  lack  of 
access  to  the  Exchange  servers  (the  Exchange  Server®  applications  running  on  the  host 
machines,  and  associated  report  generation  capabilities),  the  inability  to  deploy  custom 
applications  on  the  host  machines  to  collect  the  required  data  in  a  more  easily  used  form,  and  the 
near  term  barriers  to  installing  commercial  applications  on  the  DWAN  to  access  the  AD.  A 
completely  reliable  method  for  discerning  the  user  IDs  from  the  logs  has  not  yet  been  devised. 
The  information  in  this  section  was  captured  from  efforts  taken  after  the  cessation  of  Polar 
Guardian.  For  future  SNAs  in  which  the  use  of  tracking  logs  is  explored,  this  starting  point 
circumvents  the  bottom-up  discovery  that  has  already  been  paid  for  in  Polar  Guardian  project 
time.  Outstanding  issues  of  uncertainty  include: 

1 .  The  lack  of  sufficiently  comprehensive,  publicly  accessible  documentation  on  the  log  files 

2.  The  need  for,  and  potential  impediments  to,  high-speed  identification  of  users 

3.  The  question  of  suitability  and  adequacy  of  the  tracking  log  files,  which  appeared  to  be 
plagued  with 

•  A  plethora  of  types  of  potentially  identifying  data,  with  varying  degrees  of  potential 
ambiguity 

•  Unpredictability  in  the  presence  of  these  various  types 

•  Discrepancies  in  their  format 

Most  of  the  effort  was  devoted  to  the  study  of  the  example  Server  5.5  log,  since  that  was  the 
format  of  the  logs  at  the  time.  The  transition  to  Server  2003  was  imminent,  however.  Despite 
that,  information  pertaining  to  both  formats  was  kept  because  the  mail  servers  in  OGDs  are  not 
known. 


DRDC  CORA  TM  2009-030 


17 


One  should  also  keep  in  mind  the  remote  possibility  that  some  OGDs  do  not  use  Microsoft™ 
Exchange  Server®  (of  any  version)  at  all,  since  any  log  files  for  such  servers  will  have  yet 
another  format.  It  is  not  known  whether  the  identification  data  in  non-Microsoft™  server  logs  is 
as  unpredictable  as  in  Exchange  Server®  5.5  tracking  logs. 

The  arcane  nature  of  the  identity  information  in  DND’s  tracking  log  file  could  very  well  depend 
on  the  viewer’s  degree  of  expertise  in  email  protocols.  It  is  conceivable  that  someone  with 
extensive  training  in  email  administration  would  find  the  data  recognizable  without  more  in-depth 
documentation.  It  is  not  known  how  common  such  expertise  is.  For  Polar  Guardian,  exploration 
of  this  possibility  ended  up  in  a  referral  to  Microsoft™  by  DIMEI.  The  ensuing  discussion  led  to 
the  assessment  that  home-grown  deciphering  of  the  tracking  logs  was  not  advisable.  Since  it  is 
the  tracking  logs  that  would  be  conditionally  provided,  however,  forensics  was  performed  on 
example  log  files  to  generate  potential  approaches  to  determine  identity. 

4.5.1  Exchange  Server®  5.5  Tracking  Logs 

Exchange  Server®  5.5  log  files  and  events  are  described  in  Annex  F.  According  to  Microsoft™ 
support,  there  is  no  further,  more  elaborative  documentation.  As  per  DIMEI’s  warning,  the  log 
files  were  indeed  found  to  be  cryptic;  it  was  unclear  for  some  time  whether  sender/recipient  IDs 
could  be  deduced  from  the  records.  Anonymized  examples  from  the  tracking  logs  are  contained 
in  Annex  H.  The  tracking  logs  became  clearer  after  the  cessation  of  Polar  Guardian;  contact  was 
finally  made  with  the  appropriate  personnel  at  Microsoft™  support,  and  the  example  log  files 
were  studied  extensively. 

4.5. 1.1  Information  from  Microsoft™ 

As  mentioned.  Exchange  Server®  5.5  is  quite  old.  Microsoft™  managed  to  contact  a  former 
engineer  who  had  exposure  to  it  and  was  able  to  provide  the  following  opinions  based  on 
anecdote.  Note  that  the  level  of  expertise  in  email  protocols  of  the  Microsoft™  participants  in  the 
associated  discussions  is  not  known. 

1.  No  expertise  exists  in  Microsoft™  to  convert  the  log  file  data  into  sender/recipient  email 
addresses,  as  the  technical  people  working  on  that  transitional  format  have  moved  on  circa 
2000 

2.  Had  they  been  around,  the  solution  would  likely  be  quite  involved 

3.  There  was  scepticism  about  the  prospect  of  connecting  to  directory  services  to  resolve  the 
sender/recipient  addresses.  It  wasn’t  clear  whether  it  was  the  actual  connecting  to  the  service 
which  was  difficult,  whether  the  tracking  log  was  deemed  to  have  insufficient  information  to 
resolve  the  sender/recipient  identities  using  the  directory  services,  or  whether  the  directory 
service  was  the  GAL  or  AD. 

4.  Approximately  80%  of  the  transactions  (i.e.  records)  in  the  logs  did  not  represent  delivered 
email;  they  were  intermediate  hops  from  server-to-server,  a  series  of  which  make  up  the  end- 
to-end  delivery  of  email.  (This  simply  meant  that  there  are,  on  average,  four  hops  in  the 
delivery  of  each  email). 
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5.  The  actual  delivery  of  an  email  could  be  recognized  by  the  field  identifying  the  event 
associated  with  the  transaction.  The  event  should  be  something  to  the  effect  “message 
delivered  to  local  store”. 

Subsequent  consultation  of  the  event  definitions  (Annex  F)  reveals  the  closest  event  description  to 
“message  delivered  to  local  store”  is  “The  MTA  completed  delivery  of  a  message  to  local 
recipients  (usually  through  the  information  store)”  (event#9).  A  study  of  the  sample  log  file 
confirmed  that  such  events  comprise  24%  of  the  records.  The  complementary  event  in  Annex  F 
seems  to  be  #4  (“Message  submission”),  which  comprises  18%  of  the  records.  Both  18%  and 
24%  are  in  line  with  the  estimate  that  80%  of  a  server’s  traffic  merely  uses  the  server  as  an 
intermediate  hopping  point.  The  fact  that  outgoing  messages  are  fewer  than  incoming  messages 
seems  reasonable,  since  a  multi-recipient  message  will  count  once  as  outgoing,  but  many  times  as 
incoming31. 

Since  a  small  minority  of  the  records  are  relevant  (non-hop  events,  respectively  numbered  “9” 
and  “4”  for  receipt  and  dispatch  of  messages),  the  prospects  of  reading  all  the  relevant  data  into 
Excel®  and  using  PivotTable®  (Section  D.2)  to  generate  SNA  input  data  are  improved,  though 
the  caveats  still  apply. 

The  prospects  of  using  any  of  the  data  preparation  approaches  based  on  tracking  logs  initially 
seemed  to  be  rendered  moot,  however,  by  the  above  assessment  that  sender/recipient  ID  cannot 
be  practically  reverse-engineered  from  the  5.5  tracking  log  files.  Fortunately,  scrutiny  of  the 
sample  log  file  and  some  research  into  standards  led  to  some  possibilities  for  identification  ’2,  as 
discussed  in  the  following  sections. 

4.5. 1.2  General  Observations  on  Identification  Data 

Annex  I  contains  the  details  of  the  observations  on  identification  data. 

The  prospects  of  identification  are  further  improved  by  the  availability  of  GAL  export  data, 
discovered  after  cessation  of  Polar  Guardian.  From  speaking  with  IT  personnel,  it  was  found  that 
the  GAL  data  is  available  as  a  file  on  DWAN  at 

http://img.mil.ca/natsvcs/cfnoc  hd/DEMS/f  downloads  e.asp.  This  information  improves  the 
feasibility  of  writing  custom  applications  to  perform  identity  translation/completion,  though  as 
this  section  discusses,  challenges  still  remain. 

Many  of  the  identity  records  in  DWAN’s  GAL  database  contain  data  items  vaguely  resembling 
the  cryptic  subfields  in  some  of  the  5.5  log  fields.  The  GAL  database  lists  this  data  for  each  user 
as  “X.400”  data,  which  is  an  alternative  standard  to  SMTP  (Simple  Mail  Transfer  Protocol),  the 
protocol  for  delivering  RFC2822  messages.  X.400  has  not  come  into  the  mainstream,  but  is 
apparently  used  in  the  military,  intelligence,  and  aviation. 

In  the  example  5.5  log  file  provided  by  DIMEI,  the  X.400-like  data  occurs  not  necessarily  in  the 
fields  for  sender  or  recipient,  but  seemingly  always  in  the  “Message  ID”  field.  Therefore,  it  does 
not  directly  resolve  the  identification  problem.  In  conventional  SMTP,  however,  the  message  ID 


31  Hypothesized  rationale. 

32  Feasibility  and  level  of  effort  need  estimating. 
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typically  contains  information  that  can  help  identify  the  source  of  email;  further  consultation  with 
SMEs  is  needed  to  determine  if  the  message  IDs  in  the  log  files  can  help  resolve  the  identification 
difficulties  described  in  this  report. 

Each  of  the  sender  and  recipient  fields  contains  an  unpredictable  combination  of  data  items, 
presumably  about  the  same  person.  It  remains  to  be  determined  whether  one  person  can  be 
identified  with  different  combinations  of  data  types  in  different  records.  When  present,  certain 
data  items  are  preferable  to  others.  Ranked  in  terms  of  least  to  most  ambiguity  in  identification, 
they  are: 

1 .  SMTP  address 

2.  X.400  data,  since  it  may  include  middle  name  initials  (in  contrast  to  X.500  data,  described 
next) 

3.  X.500  data,  which  may  contain  first  and  last  name  only,  and  possibly  not  even  that.  X.500  is 
a  standard  for  directory  services  and  supports  X.400.  In  the  tracking  log  data,  X.500  data  is 
often  followed  by  three-digit  numbers;  these  turn  out  to  be  not  unique  to  individual  users,  and 
hence  do  help  resolve  user  IDs  in  an  obvious  way. 

These  data  types  in  the  tracking  logs  were  discerned  by  comparing  the  log  data  with  both  the 
downloaded  GAL  database,  and  the  GAL  as  accessed  from  Outlook.  At  the  time  of  writing,  these 
two  methods  of  accessing  GAL  information  have  yielded  non-identical  information,  irrespective 
of  whether  one  downloads  the  GAL  database  for  Exchange  Server®  5.5  or  2003.  When  accessed 
from  Outlook,  the  GAL  contains  both  X.400  and  X.500  information  for  the  users;  in  contrast,  the 
downloaded  databases  do  not  appear  to  contain  X.500  information. 

If  GAL  data  must  be  used  to  identify  a  user  based  on  X.500  data,  therefore,  it  would  be  necessary 
to  be  on  the  DWAN  in  order  access  the  online  directory  rather  than  use  the  downloaded  GAL  e.g. 
it  may  be  necessary  to  investigate  whether  the  X.500  data  can  be  obtained  via  the  AD  rather  than 
the  downloadable  GAL  databases,  as  well  as  methods  or  tools  for  doing  this.  Having  such  access 
on  the  DWAN,  however,  goes  beyond  the  constraints  being  observed  in  this  SNA. 

4.5. 1.3  Observations  on  Non-Intermediate  Hops 

Since  the  log  transactions  corresponding  to  intermediate  hops  contribute  only  noise  to  the 
identification  process,  the  records  for  non-intermediate  hops  were  examined  to  see  if  the  data  was 
more  identification-friendly.  These  transactions  are  outgoing  and  incoming  emails;  from  Section 
4.5. 1.1,  these  are  likely  to  be  event  numbers  four  and  nine,  respectively.  To  distinguish  them 
from  intermediate  hops,  the  understanding  in  this  terminology  is  that  they  are  outgoing  from,  and 
incoming  to,  mailboxes  rather  than  servers. 

In  principle,  end-to-end  email  within  DND  can  be  culled  from  the  logs  by  taking  only  the 
outgoing  transactions  or  the  incoming  transactions.  Email  with  organizations  outside  of  DND, 
however,  require  that  both  incoming  and  outgoing  emails  be  examined.  Under  such  a  counting 
scheme,  internal  email  will  be  double-counted;  conceptually,  however,  correcting  this  is  expected 
to  be  straightforward. 
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As  detailed  in  Annex  I,  records  for  outgoing  messages  seem  to  always  have  sender  first  and  last 
names.  Though  it  doesn’t  always  identify  a  user  uniquely,  the  potential  ambiguity  is  limited.  In 
contrast,  the  recipient  field  often  contains  cryptic  information.  The  reverse  is  true  for  incoming 
messages.  More  research  or  consultation  with  subject  matter  experts  (SMEs)  is  required  to 
determine  whether  identification  can  be  made  for  all  the  cryptic  identifiers. 

An  interesting,  though  theoretical,  possibility  for  avoiding  the  irresolvable  identities  arises  from 
the  fact  that  the  sender  has  good  identification  data  in  outgoing  email,  while  the  recipient  has 
good  identification  data  in  incoming  email.  Their  limited  potential  for  ambiguity  will  be  ignored 
for  the  purpose  of  describing  this  scheme.  The  possibility  for  identification  is  premised  on  the 
assumption  (based  on  the  SMTP  world)  that  the  message  ID  remains  the  same  regardless  of  the 
path  taken  for  message  delivery.  Outgoing  messages  can  be  matched  up  with  incoming  messages 
of  the  same  message  ID,  which  allows  the  identification  of  both  sender  and  recipient.  The 
message  then  contributes  to  the  message  count  for  the  SNA  link  between  their  corresponding 
mailboxes.  There  are  caveats: 

•  It  is  theoretical  because  the  tractability  of  matching  corresponding  incoming  and  outgoing 
email  needs  to  be  investigated,  and  can  be  a  potentially  involved  capability  to  develop  in- 
house 

•  Whether  the  message  ID  is  constant  throughout  the  delivery  process  needs  confirmation 

•  Both  outgoing  and  incoming  records  are  needed 

•  Though  some  of  the  identification  is  good  (sender  in  outgoing  email,  recipient  in  incoming 
email),  caveats  in  Section  4.5. 1.2  may  still  apply. 

4.5.2  Caveats  in  Using  GAL  Information  to  Discern  User  IDs  from 
Exchange  Server®  5.5  Tracking  Logs 

In  addition  to  those  described  in  Section  4.5. 1.1,  this  section  discusses  caveats  that  should  be 
considered  in  devising  an  identification  scheme  that  uses  the  GAL,  including: 

•  Anticipated  high  speed  requirement  for  GAL-assisted  translation 

•  Ambiguity  in  using  first/last  name  of  X.400/500  data  in  the  tracking  log  files 

•  Discrepancy  in  GAL  information 

•  As  an  alternative  to  the  GAL,  manually  drawing  up  a  name  table  to  resolve  user  IDs 

•  Considerations  due  to  the  many  forms  of  user  ID  in  the  tracking  logs 

If  it  turns  out  that  GAL-assisted  user  identification  is  required  to  filter  away  transactions  not 
involving  users  of  interest,  then  the  large  volume  of  raw  log  data  requires  that  the  identity 
resolution  be  very  fast.  This  would  depend,  for  example,  on  the  implementation  details  of  the 
lookup  table  created  from  the  downloaded  GAL  database. 

Prom  the  X.400/500-like  data  examples  in  Annex  I,  it  seems  possible  that  a  complete 
identification  can  be  made  from  the  surname  and  given  name.  The  greatest  drawback  to  this  is 
that  such  information  is  not  present  in  all  transaction  records,  furthermore,  users  of  the  GAL  will 
realize  that  first  and  last  names  often  do  not  uniquely  identify  a  person.  This  is  the 
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aforementioned  limited  ambiguity  of  identification  using  first  and  last  names,  and  can  be 
mitigated  by  selecting  only  servers  hosting  users  in  Canada  Command  and  JTFN.  It  is  also 
further  mitigated  in  the  case  where  a  transaction  record  has  X.400  data  that  contains  the  user’s 
initials31.  There  did  not  appear  to  be  any  middle  name  initials  for  the  X.500  data  with  which  to 
mitigate  ambiguity. 

For  the  tracking  log  records  in  which  they  are  available,  use  of  names  in  the  X.400  data  is 
complicated  by  the  fact  the  X.400  data  is  part  of  the  aforementioned  discrepancy  between  the 
downloaded  GAL  database  and  the  GAL  info  as  accessed  from  Outlook.  For  example,  it  was 
observed  that  compound  names  can  be  hyphenated  in  the  latter,  but  simply  attached  together  in 
the  former.  The  reason  for  this  discrepancy  needs  to  be  determined  before  there  can  be 
confidence  about  any  lookup  table  built  to  resolve  IDs.  If  the  lookup  table  is  built  from  the 
downloaded  GAL  data,  it  would  be  far  more  convenient  if  the  X.400  names  in  the  tracking  logs 
matched  the  downloaded  GAL  data.  This  was  indeed  observed,  but  only  from  a  cursory 
examination. 

Even  if  some  tracking  log  identities  match  the  GAL  as  accessed  from  Outlook,  however,  just 
knowing  that  the  discrepancy  is  consistent  would  allow  for  manually  changing  the  lookup  table  to 
compensate.  For  this  to  be  done,  of  course,  all  such  discrepancies  need  to  be  identified.  The 
level  of  effort  to  do  this  needs  to  be  scoped  out. 

The  above  example  of  manually  tailoring  the  lookup  table  of  first/last  names  can  be  taken  to  the 
extreme;  if  the  user  names  in  the  logs  are  always  the  same,  it  is  possible  to  forego  basing  the 
lookup  table  on  the  downloaded  GAL  database  by  manually  drawing  up  the  translation  data  based 
on  the  names  in  the  logs.  This  is  only  tractable,  of  course,  if  the  number  of  users  of  interest  is 
small  enough.  Again,  this  will  only  help  for  the  records  in  which  the  proper  X.400/500  data  is 
available  i.e.  those  that  include  first  and  last  names. 

Despite  these  workarounds  to  the  specific,  observed  examples  of  problematic  data,  a  major 
challenge  is  the  fact  that  user  ID  fields  contain  a  mixture  of  different  types  of  information  that  is 
highly  varied  across  transaction  records.  This  complicates  the  devising  of  an  identification 
scheme.  To  account  for  the  various  possibilities,  it  is  likely  that  the  final  scheme  will  take  longer 
to  implement  and  be  more  computationally  demanding  i.e.  run  slower. 

The  absence  of  X.500-like  data  in  the  downloaded  GAL  database  would  render  that  information 
in  the  tracking  logs  unusable  for  GAL-based  identification.  For  records  containing  X.500  data, 
the  only  way  to  use  the  data  would  be  to  base  the  identification  on  first  and  last  name  only,  using 
a  custom  lookup  table  built  for  such,  as  described  above.  Such  a  solution  also  inherits  the 
potential  ambiguity  in  identification  based  on  first  and  last  name. 

A  solution  to  the  construction  of  a  lookup  table  is  possible  if  the  different  ways  of  identifying  a 
user  are  predictable  and  limited.  A  simple  first  approach  might  be  to  have  multiple  entries  of  the 
lookup  table  translate  the  data  into  a  common  user.  The  as-yet  unsolved  challenge  is  ensuring 


33  For  the  purpose  of  this  report,  the  user  account  is  taken  to  be  the  most  unambiguous  identification.  It  is 
far  from  impossible  in  an  organization  the  size  of  DND/CF  to  have  people  with  same  first  and  last  names, 
and  possibly  even  the  same  middle  initial  (though  less  likely,  of  course). 
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that  all  possible  variations  of  a  user  are  identified.  They  will  then  occupy  separate  rows  in  the 
lookup  table. 

The  forms  of  “identification  data”  for  which  accommodations  cannot  be  made,  however,  even  in 
concept,  are  those  examples  in  Annex  1  that  have  no  known  association  with  a  user,  either 
through  the  presence  of  first  and  last  names,  SMTP  address,  or  some  discernible  aliasing. 

4.5.3  Exchange  Server®  2003  Tracking  Logs 

If  SNA  is  performed  on  Exchange  email  in  the  future,  it  is  quite  likely  that  the  tracking  logs  will 
be  in  Exchange  Server  2003  format.  This  is  described  in  Annex  G.  With  the  rare  exception,  the 
events  for  Exchange  Server  2003  are  defined  almost  identically  to  those  in  Exchange  Server®  5.5 
over  the  range  of  event  numbers  for  5.5.  It  was  noted,  however,  that  the  2003  events  have  a  series 
of  new  events  above  the  range  of  the  5.5  events,  and  they  are  almost  exclusively  described  as 
SMTP  related. 

As  mentioned,  initial  examination  of  a  sample  log  file  from  DIMEI  seemed  to  indicate  no 
complications  in  identifying  sender  and  recipient;  all  IDs  seemed  to  be  SMTP  addresses. 

However,  the  log  file  was  quite  large  (85MB,  over  300K  transactions),  and  a  deeper  examination 
shows  that  the  sender/recipient  IDs  suffer  the  same  kind  of  problem  as  the  5.5  log,  though  to  a 
lesser  degree.  That  is,  some  of  the  IDs  are  not  recognizable  email  addresses,  though  most  are 
SMTP  format.  Many  such  records  appear  to  be  X.500  data,  but  not  all. 

It  is  possible  that  the  differing  degrees  of  missing  email  addresses  between  the  example  logs  for 
Exchange  Server®  5.5  and  2003  are  due  to  different  roles  in  the  servers  that  generated  them,  and 
are  not  merely  due  to  the  version  of  Exchange  Server®  software.  This  suspicion  arises  from 
studying  the  ranges  and  distributions  of  the  event  numbers.  As  shown  in  Annex  F,  the  5.5  log 
files  have  event  numbers  discontinuously  defined  in  the  two  ranges  0-52  and  1000-1018.  The 
descriptions  for  the  high  range  of  events  seem  to  deal  almost  exclusively  with  SMTP  and 
nonlocal  delivery,  in  apparent  contrast  with  events  in  the  low  range.  Only  14%  of  the  records  in 
the  5.5  example  log  have  events  in  the  high  range,  and  all  such  events  are  numbered  exactly  1000 
(which  indicates  that  sender  and  recipient  occupy  the  same  server).  This  can  be  contrasted  with 
the  example  2003  log,  in  which  all  events  are  well  distributed  above  1018;  this  range  is  not 
defined  for  5.5  logs  and  deals  almost  exclusively  with  SMTP.  Because  of  the  different  types  of 
events,  it  is  possible  that  the  2003  log  file  came  from  a  server  playing  a  more  specialized  role. 

Caution  is  warranted,  therefore,  in  assuming  how  representative  the  example  log  files  are.  If  the 
2003  example  log  came  from  a  gateway  server  that  connects  DND  to  the  outside  world,  for 
example,  a  logical  question  is  whether  the  log  can  provide  any  insights  at  all  into  the 
decipherability  of  logs  for  user-hosting  servers.  In  fact,  answers  are  needed  to  the  questions  of 
whether  mailbox  hosts  and  gateways  require  separate  servers,  and  what  other  server  roles  may 
and  may  not  share  the  same  server  as  mailbox  hosts,  and  of  those  that  may,  whether  any 
transactions  are  logged  that  are  irrelevant  to  identifying  end-to-end  email  volume.  Such  log 
entries  would  appear  to  obfuscate  the  log  files  but  would  actually  be  ignored  in  front-end 
filtering.  They  do  not  add  to  complications  in  user  identification  for  end-to-end  email,  but  do 
increase  the  volume  of  front-end  data. 
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4.6  Legal  compliance 

Prior  to  cessation  of  Polar  Guardian,  C&S  was  in  the  process  of  determining  the  appropriate 
persons  with  whom  to  consult  for  legal  approval  to  receive  data  on  DND  email.  Information 
about  the  appropriate  persons  was  still  forthcoming,  but  C&S  felt  that  approval  would  have  been 
readily  obtained,  if: 

1.  EXORT  and/or  C&S  can  give  assurances  to  DIMEI  that  access  to,  and  use  of,  the  data  in  their 
possession  would  be  controlled  e.g.  by  stipulating  which  team  members  will  be  allowed 
access  to  the  data,  under  what  conditions,  and  how  those  individuals  will  handle  the  data  to 
prevent  inadvertent  proliferation,  including  but  not  limited  to  the  media,  networks,  and  hosts 
which  can  carry  the  data. 

2.  There  would  be  commitments  to  safeguard  against  abuse  of  privacy. 

Analysts  might  also  inquire  with  DIMEI  (footnote  16,  p.  6)  for  guidance  in  navigating  the  issues 
of  approval,  since  his  group  must  have  had  to  deal  with  them  in  obtaining  the  logs  for  his  study. 
There  are  various  types  of  server  logs,  however,  and  it  should  be  confirmed  that  the  logs  used  by 
DIMEI  are  the  same  as  the  tracking  logs  that  DIMEI  has  made  conditionally  available  for  the 
SNA. 

Since  DIMEI  will  conditionally  provide  the  tracking  log  files  with  no  pre-processing  or  filtering, 
the  assurances  of  due  diligence  and  legal  approval  should  cover  more  than  just  the  use  of  the  data 
for  the  SNA.  It  must  cover  the  fact  that  all  of  the  data  in  the  raw  log  files  will  be  delivered,  even 
though  most  of  it  will  be  ignored.  This  is  one  of  the  reasons  why  obtaining  more  explanatory 
documentation  of  the  log  files  was  so  important;  it  allows  a  clear  description  of  what  the  data 
means,  and  what  can  be  inferred  from  it. 

Aside  from  DND  email,  the  prospects  of  obtaining  necessary  approvals  from  OGDs  are  another 
matter  still  to  be  investigated.  It  is  not  expected  to  be  straightforward. 

4.7  Data  Preparation  Outside  of  DWAN 

The  DWAN  policies  against  installation  of  non-standard  software  are  quite  restrictive.  Since  the 
options  for  meeting  the  data  preparation  requirements  were  speculative,  it  would  necessary  to  use 
a  more  flexible  computing  environment  such  as  DRENET  so  that  the  required  applications  can  be 
installed  to  accommodate  the  possible  contingencies.  Local  admin  rights  to  install  software  and 
computing/analysis  environments  on  DRENET  are  sometimes  granted  on  a  case-by-case  basis, 
depending  on  the  need. 

An  initial  concern  was  the  possibility  of  restrictions  on  where  the  acquired  log  files  can  be  hosted. 
If  they  had  to  remain  on  DWAN,  the  ability  to  implement  the  data  preparation  approaches  would 
be  severely  limited.  DIMEI  has  confirmed  that  analyzing  the  data  on  DRENET  is  not  a  problem, 
however,  as  long  as  legal  approval  to  access  the  data  is  obtained. 

Aside  from  DRENET,  other  options  for  preparing  the  data  and/or  the  subsequent  SNA  on 
resulting  traffic  metrics  can  be  performed  on  a  stand-alone  system.  In  future  SNAs  that  EXORT 
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may  conduct,  the  work  can  also  be  done  on  the  cubicle-area  network  that  was  being  planned  by 
EXORT  at  the  time  of  writing. 
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5  Lessons  Learned 


Section  3  presented  requirements,  estimates,  and  considerations  that  arise  from  considering  likely 
numbers  of  users,  servers,  and  organizations  for  the  SNA,  as  well  as  the  user/server  organization 
within  DND.  Section  4  vetted  tools  and  approaches  for  preparing  input  data  for  the  SNA,  and 
captured  characteristics  and  implications  in  the  unfavourable  case  where  tracking  logs  are  used  to 
generate  this  data.  Issues  on  legal  approval  and  migration  of  data  to  a  hospitable  analysis 
environment  were  touched  upon. 

In  addition  to  overall  lessons,  this  section  discusses: 

•  Activities  that  should  be  initiated  as  early  as  possible  in  an  SNA  study  because  of  potentially 
long  resolution  times 

•  Caveats  related  to  commercial  solutions  for  data  preparation 

•  The  need  for  access  to  subject  matter  expertise  in  email  administration 

•  Anticipations  for  an  interagency  SNA 

•  Suggestions  for  a  follow-up  survey 

The  lessons  and  recommendations  extrapolate  from  the  experience  in  Polar  Guardian’s  SNA  and 
can  therefore  be  speculative  in  nature.  This  is  particularly  true  in  view  of  the  fact  that  many  of  the 
issues  result  from  incomplete  information,  and  the  planning  of  courses  of  action  for  the 
contingencies.  The  questions  raised  might  help  direct  future  efforts.  Much  of  the  discussion 
revolves  around  the  need  for  subject  matter  expertise  in  assessing  options  for  data  preparation. 
Within  the  context  of  a  future  effort  (resources,  time  frame,  authority,  constraints),  a  SME34  can 
help  determine  which  issues  are  resolvable  trivially  or  with  a  reasonable  amount  of  research. 

Some  issues  might  not  be  resolvable  within  the  limits  of  the  project  or  might  not  be  relevant  in 
the  circumstances  of  the  study.  Some  speculations  may  also  be  off-base,  but  awareness  of  their 
possibility  could  result  in  more  incisive  discussions  with  solution  vendors  or  in-house  personnel. 

5.1  Reconsidering  the  Constraints,  Time  Frame,  and 
Approaches 

•  When  studying  electronic  communication  logs  (as  opposed  to  using  surveys),  SNA  requires 
a  lot  of  high-volume  data  processing.  Data  source  collection  planning  must  be  thoroughly 
considered  in  advance. 

This  is  not  merely  an  observation  from  the  Polar  Guardian  effort.  It  corroborates  with  a 
discussion  with  DIMEI  (footnote  16,  p.  6)  about  his  server-to-server  traffic  study,  and  with 
communication  with  the  social  network  analyst  for  MNE  4. 

•  In  the  context  of  thoroughly  considering  the  data  collection,  the  whole  approach  of  using 
only  the  tracking  log  files  for  email  traffic  volume  needs  re-examination. 


34  SME:  subject  matter  expert. 
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The  use  of  tracking  logs  was  based  on  the  premise  that  email  header  information  was 
captured  in  those  logs.  When  this  turned  out  not  to  be  the  case,  the  prospects  of  user 
identification  became  extremely  unclear.  Most  of  the  effort  was  then  devoted  to  exploring 
possible  methods  for  identification  of  users,  as  consistently  and  reliably  as  possible,  but  in  a 
short  time  frame.  This  precluded  installing  home-grown  or  commercial  applications  on 
DWAN  or  the  mail  servers,  since  it  incurs  the  risk  and  delay  of  authorization,  and  cost  in  the 
case  of  commercial  solutions.  Challenges  and  uncertainties  remain. 

•  Considering  the  feasibility  challenges  that  arise  from  the  above  constraints,  it  is  highly 
advisable  to  expand  the  time  horizon  for  data  preparation  and  further  investigate  options  that 
involve  data  collection/preparation  done  at  least  partly  on  the  mail  servers  and/or  by 
applications  on  DWAN  machines  that  can  access  network  services  such  as  the  AD. 

•  Though  it  costs  time  and  may  be  expensive  to  install  applications  on  the  mail  servers  or 
DWAN,  and  approval  may  be  uncertain,  it  is  in  no  way  clear  that  less  time  would  be  needed 
to  solve  the  user  identification  problem  with  the  tracking  logs  in  isolation  (with  the  possible 
help  of  downloaded  GAL  data).  For  the  latter,  reliable  and  consistent  identification  may 
even  be  infeasible,  as  indicated  by  discussions  with  Microsoft™  and  two  solution  vendors. 

•  Flow  the  data  gathering  and  preparation  might  be  implemented  on  the  mail  servers  requires 
further  investigation. 

■  Discussion  with  DIME1  (footnote  16,  p.  6)  may  provide  a  starting  point. 

■  The  investigation  should  include  finding  out  what  the  tools  that  come  with 
Exchange  Server®  are  capable  of,  what  suitable  software  may  reside  on  the 
server  machine  (if  that  is  possible)  as  a  peer  application  with  the  server,  and 
whether  suitable  applications  can  be  written  on  top  of  the  functionality  provided 
by  server  tools. 

•  Software  to  be  installed  on  DWAN  to  access  the  AD  could  face  smaller  approval  barriers 
and  less  delay  than  implementing  solutions  on  the  server,  and  should  be  reconsidered. 

Several  commercial  packages  were  rejected  because  they  needed  such  access  to  the  AD  to 
identify  users  in  the  log  records.  With  a  longer  time  horizon,  it  is  recommended  that  social 
network  analysts  re-visit  these  options. 

•  In  an  SNA  of  electronic  communication  where  user  identification  data  is  not  an  issue,  if  the 
data  set  is  small  enough,  the  use  of  ExceKD’s  PivotTable®  feature  can  be  used  for  the  task  of 
data  preparation  with  minimal  fuss. 

5.2  Things  to  Start  Addressing  Early  in  a  Study 

•  In  an  SNA  of  electronic  communication,  insist  on  getting  sample  data  up-front  to  check  the 
usability  of  the  data  and  the  level  of  effort  required  to  make  it  usable. 

As  illustrated  in  Polar  Guardian,  the  request  for  information  can  sometimes  be  interpreted  in 
a  manner  that  was  not  anticipated,  and  the  reality  is  that  the  usability  is  much  less  than  one 
may  have  been  led  to  believe.  If  making  the  data  usable  takes  much  more  effort  and 
involves  much  more  uncertainty  than  originally  envisioned,  then  the  SNA  itself,  the 
approach  to  gathering  data,  and/or  the  timeframe  needs  re-examination. 
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•  Get  the  approval  for  accessing  the  data  from  the  responsible  authority,  in  case  any  time- 
consuming  preconditions  need  satisfying. 

•  Until  data  is  actually  received,  even  expressed  approval  should  be  viewed  merely  as  intent. 
Nontrivial  conditions  may  be  imposed  at  any  time.  For  SNA  in  particular,  the  use  of  data  on 
interpersonal  interactions  inevitably  raises  issues  of  security  and  privacy.  In  the  case  of 
Polar  Guardian,  the  initial  approval  for  data  was  later  prefaced  with  a  requirement  for  legal 
assurance  to  address  those  issues. 

•  Initiate  inquiry  into  legal  approval  for  obtaining  data  on  electronic  communications  early  in 
the  study,  including  inquiry  concerning  what  legal  authorities  need(s)  to  be  contacted.  It 
could  take  some  time.  Due  to  the  sensitive  nature  of  email  data,  future  SNAs  based  on 
email  log  data  would  almost  certainly  elicit  similar  concerns  (as  opposed  to  SNA  of 
communications  in  experiment-specific  common  operating  environments).  Beyond  legality, 
consultation  with  Director  Military  Personnel  Operational  Research  and  Analysis 
(DMPORA)  is  needed  to  clarify  the  restrictions  and  obligations  arising  from  ethical 
considerations35. 

•  The  identity  of  those  people  whose  communications  are  relevant  to  the  SNA  are  required 
ahead  of  time  to  determine  the  number  of  servers  on  which  they  reside,  the  expected  volume 
of  data,  and  possibly  even  the  cost  of  any  commercial  software  licenses.  Getting  these 
names  depends  on  responses  from  different  people,  and  can  take  time. 

5.3  Caveats  for  Commercial  Packages  for  Data  Preparation 

•  Until  one  actually  sees  a  commercial  solution  perform  the  preparation  of  the  required  data, 
the  stated  or  implied  suitability  of  a  tool  or  approach  cannot  be  taken  for  granted. 

■  An  analyst  can  describe  functional  requirements  to  potential  tool  vendors,  but 
they  are  not  always  carefully  read,  or  the  suitability  of  a  candidate  tool  may  be 
casually  posited. 

■  In  the  case  of  MessageStats™,  confirmation  of  cost  and  functional  suitability 
was  forthcoming  from  the  vendor  for  quite  some  time,  as  was  the  demo,  but  did 
not  materialize 

■  Additional  requirements,  such  as  the  need  for  SQL  Server  in  several  cases,  could 
delay  or  stall  a  solution.  SQL  Server  in  particular  seems  to  be  a  commonly 
encountered  requirement.  Its  absence  in  an  organization  that  deals  with 
generating  and  analyzing  data  might  not  indicate  a  shortcoming  in  the 
commercial  solution  so  much  as  it  indicates  a  capability  requirement  in  the 
performing  organization.  The  IT  department  for  CFEC  (SEAMS)  has  expressed 


35  Within  DND,  a  former  incarnation  of  an  oversight  body  for  ethical  research  involving  human  subjects  is 
described  in  [9].  This  is  a  research  ethics  board  involving  Director  Military  Personnel  Strategy  (DMP 
Strat)  and  Centre  for  Operational  Research  and  Analysis  (CORA).  This  role  now  falls  to  DMPORA,  which 
also  coordinates  surveys  across  DND  to  avoid  over-surveying  segments  of  personnel.  Such  oversight 
would  be  essential  for  a  survey-based  SNA. 
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recognition  of  a  vendor-agnostic  need  for  this  capability36,  beyond  the  free  (and 
more  limited)  alternatives. 

•  Despite  the  potential  advantage  of  commercial  tools,  there  are  components  of  time  overhead 
and  risk  to  be  taken  into  consideration,  as  per  the  following  examples. 

■  The  search  for  candidates  and  the  exploration  and  confirmation  of  their 
suitability  can  take  quite  a  bit  of  time. 

■  The  approval  for  their  purchase  and  installation  introduces  further  risk  and  delay; 
this  also  applies  to  any  prerequisite  software  not  already  available. 

■  If  the  software  needs  to  be  installed  on  DWAN  specifically  e.g.  to  access  the 
AD,  the  approval  for  installation  introduces  more  risk  and  delay  than  if  it  can  be 
installed  and  used  on  a  less  restrictive  environment  for  research  and  data 
analysis,  such  as  DRENET.  (For  the  SNA,  however,  the  AD  of  interest  is  on 
DWAN). 

■  Any  deployment  of  data  gathering  applications  on  the  Exchange  servers 
themselves  introduces  significant  risk  and  delay.  Depending  on  the  capabilities 
that  come  with  Exchange  Server®,  however,  it  might  be  possible  to  get  data  in 
suitable  form  without  purchasing  commercial  reporting  tools. 

These  factors  need  to  be  investigated  and  weighed  in  determining  which  tools  will  serve  as 
alternatives  to  a  complete  in-house  scripting  approach. 

•  The  robustness  of  commercial  solutions  needs  to  be  assessed. 

Some  of  the  relevant  records  in  the  mail  server  logs  look  like  they  wouldn’t  be  amenable  to 
any  kind  of  identification.  If  deep  knowledge  about  the  server  logs  is  as  rare  as  Microsoft™ 
suggests,  then  it  is  conceivable  that  such  records  are  simply  ignored  by  commercial 
packages.  This  begs  the  question  of  whether  they  are  any  better  than  an  imperfect  home¬ 
grown  solution  that  potentially  uses  GAL  data  rather  than  the  AD,  and  ignores  the 
irresolvable  identities. 

5.4  General  Access  to  Expertise  in  Email  Administration, 
Exchange  Server®,  and  Directory  Services 

•  For  future  planning  purposes,  note  that  end-user  access  to  Microsoft™  support  is  controlled 
relatively  carefully  within  DND  and  may  take  time  to  initiate. 

A  series  of  communications  within  DND  can  be  expected  before  making  contact  with 
Microsoft™,  after  which  a  series  of  communications  can  be  expected  in  order  to  reach  the 
person  with  the  relevant  expertise.  Before  relying  on  this  path,  be  aware  that  it  could  take 
weeks.  It  can  be  necessary,  however,  if  there  is  lack  of  in-house  expertise,  lack  of  time  on 
the  part  of  any  in-house  experts,  or  understandably,  lack  of  priority  given  to  research 
projects  relative  to  operational  needs. 


36  While  need  for  the  generic  capability  was  recognized,  there  was  care  to  articulate  this  need  in  a  manner 
that  was  independent  of  a  particular  tool  or  product  so  as  not  to  unduly  bias  the  future  choice  of  a  solution. 
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•  It  is  highly  beneficial  to  have  involved  in  the  SNA  a  SME  in  Exchange  Server®  and  email 
protocols,  and  to  have  access  to  this  person37.  It  would  be  even  better  for  responsibility  to 
the  SNA  to  be  part  of  the  work  plan  of  a  person  within  the  DIMEI  hierarchy.  The  impact 
that  this  would  have  on  an  email-based  SNA  is  highlighted  by  the  following  circumstances. 

•  In  Polar  Guardian,  key  information  was  encountered  in  a  piecemeal  and  random  manner,  or 
encountered  in  web  searches  to  decipher  the  log  file 

Examples  include  information  about  email  protocols,  directory  services  (such  as  that  for 
user  identification).  Exchange  Server®,  and  the  likely  organization  of  users  among  servers. 
Having  direct  access  to  a  SME  would  greatly  accelerate  this  indirect  method  of  bottom-up 
knowledge  building.  Without  such  an  expert,  much  of  the  information  in  this  report  would 
not  have  been  acquired  were  it  not  for  fortuitous  networking  and  the  taking  of  every 
opportunity  to  direct  conversation  toward  the  topic  of  email  administration. 

•  In-house  personnel  who  were  able  to  contribute  knowledge  had  operational  responsibilities 
that  precluded  the  necessary  involvement  in  the  planning  of  data  collection,  both  in  terms  of 
degree  and  timeliness. 

Therefore,  in  addition  to  the  networking  overhead  to  find  the  right  persons  to  connect  with, 
there  is  also  overhead  in  continually  maintaining  tactful  communications  until  their 
operational  priorities  were  adequately  dealt  with  to  permit  response  on  the  fragments  of 
information  being  sought. 

•  Access  to  required  knowledge  can  be  hampered  because  direct  communication  with  the 
most  relevant  in-house  experts  rather  than  the  formal  point  of  contact  (POC),  without 
formally  established  channels  of  communication,  is  not  always  appropriate. 

Key  information  about  the  server  versions  in  DND  and  their  transition  in  the  near  future,  for 
example,  would  never  have  been  encountered  were  it  not  for  a  conversation  with  a 
contractor  in  DIMEI;  such  discussions  constitute  lateral  communication.  Basically, 
"legitimate"  and  approved  channels  of  communication  would  not  have  allowed  the  gleaning 
of  important  facts  because  one  has  to  know  what  to  ask  for  before  fielding  the  right 
questions  through  the  formal  POC,  higher  up  in  the  chain  of  command.  The  extra 
intervening  links  also  increase  delay,  with  multiple  follow-ups,  since  the  POC  may  be  more 
senior,  quite  busy,  and  have  operational  crises  to  solve.  Timely  confirmation  of  information 
from  the  SME  is  not  possible;  this  leads  to  nontrivial  miscommunications,  such  as  positing 
that  the  tracking  logs  contained  email  headers. 

•  Much  of  the  required  email  knowledge  and  networks  is  not  necessarily  cutting  edge,  but  is 
in  a  highly  specialized  domain  and  sometimes  difficult  to  find,  access,  or  utilize. 

Documentation  is  sometimes  not  obtainable  publicly,  or  requires  augmentation  with 
expertise  and  experience  to  be  quickly  meaningful  to  those  outside  of  the  field  e.g.  standards 
for  mail  protocol  and  networking;  information  about  Exchange  Server®  and  events. 


37  The  SME  should  be  ready  for  heavy  involvement  when  needed,  not  merely  someone  whom  the  analyst  is 
referred  to,  and  whose  official  responsibility  does  not  include  support  of  the  SNA.  Even  officially 
supporting  the  project  on  paper  is  not  sufficient  if  such  support  always  takes  a  back  seat  to  operational 
priorities,  since  there  always  seems  to  be  many  more  of  those  priorities  than  there  is  time  for. 
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•  On  the  other  hand,  email  administration.  Exchange  Server®,  and  directory  services  have 
large  amounts  of  technical  knowledge  behind  them,  which  can  also  impede  the  search  for 
information. 

Because  of  the  sometimes  voluminous  documentation,  the  prospects  of  identifying  the 
specific  information  to  resolve  a  particular  issue  could  be  quite  small  without  expert 
guidance.  For  those  not  in  the  field,  it  is  sometimes  unclear  that  the  right  information  is 
being  looked  at,  even  after  it  has  been  located. 

•  Access  to  subject  matter  expertise  is  required  in  all  aspects  of  implementing  data  gathering 
and  preparation  on  the  mail  servers,  the  options  for  which  are  speculated  in  Section  5.1. 

■  Expertise  is  needed  to  confirm  whether  suitable  capability  actually  exists  on  the 
servers,  as  well  as  the  expertise  and  effort  needed  to  utilize  it  for  generation  of 
SNA  input  data. 

■  Since  implementing  solutions  on  the  servers  can  be  a  sensitive  issue,  subject 
matter  expertise  would  also  inform  an  estimate  of  the  degree  of  imposition  at  the 
servers  to  do  this,  which  affects  the  likelihood  of  approval. 

■  If  in-house  implementation  of  such  a  solution  is  required,  the  need  for  Exchange 
Server®  expertise  will  also  be  likely. 

•  It  is  conceivable  that  cost  and  approval  barriers  rule  out  all  options  to  data  preparation 
except  for  the  culling  of  data  from  the  tracking  logs  only,  with  possible  utilization  of 
downloaded  GAL  data  to  identify  users  rather  than  online  access  to  DWAN’s  AD.  An 
informed  assessment  of  the  approach  is  then  needed,  based  on  expert  knowledge  of  the 
following: 

■  The  circumstances  under  which  the  different  combinations  of  identity 
information  arise  in  the  tracking  log  files 

■  Why  the  X.500  data  is  missing  disambiguating  middle  initials,  while  they  seem 
to  be  present  in  at  least  some  of  the  X.400  data. 

■  The  limits  on  reliability  of  any  empirically  based  identification  scheme.  Even  if 
great  effort  was  expended  to  concoct  byzantine  schemes  that  border  on  forensics 
or  reverse  engineering,  and  that  seem  to  resolve  all  the  relevant  identities  in  the 
example  log,  there  is  no  guarantee  that  they  will  work  for  similar  data  in  the 
deluge  of  logs  for  the  actual  SNA.  Nor  is  there  guarantee  that  new  forms  of 
cryptic  data  will  not  be  encountered.  This  is  particularly  true  in  view  of  the  fact 
that  log  data  from  OGDs  have  not  been  studied.  The  unreliability  of  empirically 
crafted  identification  schemes  is  the  consequence  of  the  bottom-up 
characterization,  since  it  is  not  necessarily  known  why  the  data  is  generated  in 
the  manner  that  it  is.  Though  Microsoft™  indicated  that  not  many  people  today 
know  the  inner  workings  of  the  older  server  logs,  a  SME  would  have  more 
experience  on  which  to  base  opinion  about  the  reliability  of  an  empirical 
identification  scheme. 

■  Asa  basis  for  developing  an  empirical  identification  scheme,  it  is  advisable  to 
get  a  more  experienced  opinion  about  how  representative  are  the  example  log 
files  (or  to  use  actual  log  files  from  the  servers  of  interest,  if  they  are  available  at 
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the  time  the  scheme  is  being  devised).  For  Polar  Guardian’s  SNA,  the  sample 
log  file  for  Exchange  Server®  2003  was  especially  in  need  of  this  appraisal. 

■  As  per  the  above  comment  on  expert  access  to  arcane  information,  an  email 
administrator  would  likely  have  access  to  crucial  documentation,  such  as  the 
event  definitions  for  Exchange  Server®  2003.  This  is  necessary  to  select  the 
proper  records  from  the  log  file.  It  also  provides  clues  about  the  problems  that 
need  advance  recognition;  for  example,  the  questionable  representativeness  of 
the  2003  sample  log  should  be  recognized  before  any  effort  is  spent  on  an 
empirical  identification  schemes.  The  event  definitions  only  became  publicly 
accessible  during  the  final  drafts  of  this  technical  note;  the  analysis  for  which  it 
is  needed  should  not  be  so  dependent  on  such  happenstance. 


5.5  Subject  Matter  Expertise  for  Options  Assessment 

•  Vetting  of  commercial  tools  would  have  been  far  more  efficient  and  lucid  with  access  to  the 
proper  knowledge  about  email  and  directory  services. 

Vendors  of  tools  such  as  those  vetted  are  usually  speaking  to  SMEs.  Not  having  that 
background  is  a  communication  barrier. 

•  For  commercial  solutions  that  qualify  as  candidates,  subject  matter  expertise  may  be  needed 
to  determine  whether  imperfect  robustness  (Section  5.3)  points  to  a  fundamental  limit  in 
identification. 

For  the  relevant  identities  (those  in  records  for  non-intermediate  hops)  that  do  not  appear  to 
represent  users  at  all,  tool  performance  will  indicate  whether  they  actually  do  not  lend 
themselves  to  any  kind  of  identification  method.  A  SME  might  be  able  to  determine 
whether  this  is  a  fundamental  limitation  in  that  there  simply  isn’t  enough  data  to  resolve 
identity  even  with  the  use  of  the  AD,  regardless  of  the  tool  or  method.  If  so,  it  is  an  upper 
bound  on  the  accuracy  of  the  data  for  the  SNA. 

Such  a  determination  has  ramifications  for  the  home-grown  options  also.  No  further  effort 
should  be  spent  on  the  impossible  identifications.  This  should  simplify  home-grown 
solutions,  decrease  development  time,  and  dispel  their  disadvantage  relative  to  commercial 
solutions. 

•  On  the  other  hand,  subject  matter  expertise  may  also  inform  the  determination  that  such  an 
upper  bound  on  accuracy  is  not  as  solid  as  it  seems  e.g.  because  bad  data  can  be 
circumvented  by  tracking  messages  across  server  logs  and  combining  the  records. 

The  apparent  impossibility  of  identification  may  actually  be  relative,  and  may  depend  on  the 
cost  and  amount  of  effort  that  can  be  afforded  on  forensics.  For  example,  the  tracking  logs 
may  contain  enough  data  to  mine  and  reconstruct  message  pathways  from  server  to  server, 
thus  allowing  sender/recipient  to  be  identified  from  some  records  even  when  they  are 
missing  from  others.  This  generalizes  the  scheme  of  matching  incoming  and  outgoing 
messages  (Section  4.5.3).  A  quick  search  shows  that  such  tracking  is  almost  certainly 
available  on  Exchange  Server®  software.  (With  this  functionality  in  mind,  and  the 
knowledge  of  multiple  hops  per  email  delivered,  the  corroborating  descriptions  in  the 


32 


DRDC  CORA  TM  2009-030 


tracking  log  specifications  of  Annex  F  become  evident).  The  SME  would  have  a  much 
better  idea  of: 

■  The  robustness  of  using  tracking  to  completely  identify  sender  and  recipient 

■  How  easy  it  is  to  get  data  in  the  form  required  for  the  SNA  from  the  results  of 
such  tracking  by  the  server. 

■  The  tractability  of  tracking  a  large  volume  of  email  in  the  context  of  DND’s 
large  email  system 

■  In  consultation  with  vendors  e.g.  in  Section  4.2  and  Annex  D,  how  likely  it  is 
that  a  robust  commercial  tool  relies  on  tracking,  either  built-in  or  from  the  server. 
Such  reliance  exposes  them  to  any  limitations  in  message  tracking  that  may 
apply  within  the  DND  email  environment. 

The  practicality  of  developing  such  tracking  functionality  in-house  in  a  reasonable  amount 
of  time  is  questionable  because  it  would  require  intimate  familiarity  with  the  Exchange 
servers,  tracking  logs,  and  algorithms  and  data  structures  for  high-speed  matching  over  a 
large  volume  of  data.  For  commercial  tools,  however,  it  isn’t  clear  what  is  impractical  once 
data  is  filtered,  pre-processed,  and  read  into  a  relational  database,  such  as  that  required  by 
the  tools  vetted. 

A  potential  source  of  intractability  in  DND  is  the  need  for  the  logs  of  intermediate  servers  in 
the  reconstructed  message  pathways,  especially  if  such  pathways  are  not  deterministic  (an 
SME  would  likely  know  if  they  are).  In  the  worst  case,  the  paths  are  completely 
unpredictable  and  the  logs  for  all  the  servers  would  be  needed.  This  implies  a  large  number 
of  servers  to  monitor,  even  if  there  are  a  small  number  of  interesting  users  and  a  small 
number  of  servers  hosting  them.  Based  on  subject  matter  expertise,  the  tractability  of  the 
required  amount  of  data  and  processing  needs  to  be  checked  with  the  scope  of  the  project. 

•  Depending  on  the  technical  details  behind  the  various  user  identification  problems  and  the 
various  approaches  to  preprocessing,  subject  matter  expertise  may  be  needed  to  assess  the 
impact  on  the  resulting  SNA  data. 

A  generalization  of  the  previous  point  is  that  there  may  be  identities  that  require  varying 
degrees  of  sophistication  and  resources  to  resolve  correctly,  which  means  that  the  various 
tools  and  approaches  will  have  different  degrees  of  identification  robustness  and  feasibility. 
This  applies  to  solutions  that  are  home-grown,  use  commercial  tools,  or  a  combination  of 
the  two.  For  example,  solutions  that  simply  discard  records  containing  hard  identification 
problems  may  be  quite  feasible38,  but  not  robust  against  many  of  the  vagaries  of  the 
identification  data.  Expertise  in  Exchange  Server®  operation  could  inform  the  assessment 
of  how  severe  are  the  effects  of  lost  data  by  considering  the  circumstances  that  give  rise  to 
the  records  that  are  not  handled  (  or  worse,  improperly  handled)  by  said  solutions  rather  than 
considering  just  the  quantity  of  erroneously  processed  records.  For  example,  10%  loss  in 
data  may  be  acceptable  if  it  was  randomly  distributed,  but  can  change  the  patterns  in  the 
SNA  graph  if  the  cause  is  systematic.  If  the  erroneous  records  actually  misidentify  sender 


38  In  terms  of  cost,  time,  and  technical  difficulty. 
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or  recipient,  a  SME  might  also  be  able  to  estimate  the  seriousness  of  the  SNA  data  pollution 
by  considering  the  technical  details  of  how  the  misidentification  is  made39. 


5.6  Feasibility  of  Email  SNA 

The  lessons  learned  has  so  far  dealt  with  the  challenges  in  planning  SNA  data  collection  and 
preparation,  and  in  the  next  level  of  detail,  the  challenges  and  unknowns  in  the  candidate 
approaches  to  their  solution.  At  this  point,  an  SNA  within  DND,  let  alone  externally,  sounds  very 
difficult. 

Many  of  these  challenges,  however,  stem  from  the  particular  circumstances  of  the  Polar  Guardian 
SNA.  If  this  had  to  be  boiled  down  to  one  all-encompassing  point,  it  would  be  the  fact  DIMEI 
was  not  approached  for  active  involvement  with  the  planning  at  the  outset.  This  oversight  is  not 
difficult  to  understand.  Without  awareness  of  how  involved  it  can  be,  why  wouldn  ’t  one  initially 
forge  ahead,  gather  data,  preprocess  it,  and  analyze  the  result?  This  led  to  the  discovery  of  the 
bottom-up  problem  of  trying  to  generate  meaningful  SNA  input  from  possibly  not  the  best  source, 
with  a  just  an  articulation  of  how  it  could  be  done  otherwise,  along  with  some  risk  identification. 

Getting  buy-in  from  a  level  high  enough  in  DIMEI  to  commit  the  necessary  SME  resources  to 
SNA  could  completely  change  the  outlook.  If  options  for  data  gathering  at  the  servers  were  well 
understood,  it  is  possible  that  all  the  challenges  and  questions  related  to  user  identification  in  the 
tracking  logs  could  simply  become  irrelevant.  In  fact,  it  is  difficult  to  see  how  a  SME  would  not 
be  able  to  advise  on  the  most  suitable  method  of  data  collection  and  help  with  the  permissions  in 
setting  it  up.  The  inclusion  of  technical  subject  matter  expert  in  the  SNA  effort  would  avoid  the 
situation  where  the  SMEs  respond  primarily  to  operational  priorities,  with  responses  to  queries 
from  peripheral  research  efforts  by  unknown  analysts  on  an  opportunistic  basis.  Operational 
priorities  often  mean  that  such  sporadic  responses  can  do  little  more  than  factually  answer  the 
technical  questions  as-posed,  sometimes  after  several  iterations  of  clarification,  rather  than 
providing  contextual  information  and  advice  on  the  approach  (for  example,  on  alternatives  to 
generating  information  from  the  tracking  logs). 

In  this  study,  DIMEI  was  found  to  be  a  key  player,  technically  and  otherwise.  The  organizational 
and  inteipersonal  dimensions  of  engaging  key  players  are  not  technical,  but  are  clearly  recognized 
in  an  explicit  manner  in  [10]. 

For  an  interagency  SNA,  the  greater  technical  diversity,  and  organizational  and  political 
complexity,  mean  that  having  buy-in  from  a  high  enough  level  is  even  more  important  to  ensure 
that  the  relevant  information  stewards  and  technical  SMEs  support  the  effort. 

5.7  Considerations  for  an  Interagency  SNA  based  on  Email 

The  prospects  of  incorporating  OGDs  into  an  email-based  SNA  can  be  expected  to  depend  on  the 
data  collection  method.  If  it  deploys  applications  on  their  mail  servers,  OGDs  may  be  quite 
sensitive  about  it.  If  it  merely  requires  access  to  their  mail  server  logs  and  access  to  their  AD  for 


39  Indeed,  the  feasibility  and  confidence  of  such  an  estimate  would  be  in  the  province  of  the  SME. 
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user  identification,  the  prospects  are  likely  better40.  Liaising  with  the  OGDs  is  needed,  however, 
to  get  a  more  concrete  idea  of  the  likelihood  for  approval.  Subject  matter  expertise  is  needed  to 
determine  if  these  two  scenarios  are  indeed  the  only  main  options. 

Subject  matter  expertise  is  also  needed  to  anticipate  the  severity  of  the  following  challenges,  and 
the  expertise  and  tasking  to  resolve  them. 

•  The  challenges  in  integrating  the  data  across  separate  domains  need  to  be  determined 

•  The  challenges  arising  from  possibly  dissimilar  mail  systems,  networks,  and  configurations 
thereof,  need  to  be  anticipated. 

•  Liaising  with  the  OGDs  from  a  position  of  technical  expertise  is  needed  to  assess  the  extent 
of  such  dissimilarities.  Whether  disparateness  of  systems/networks  within  the  same  OGD 
poses  additional  challenges  also  needs  to  be  determined. 

If  it  turns  out  that  the  challenges  to  such  an  SNA  are  beyond  the  scope  of  a  given  project,  a 
completely  survey-based  SNA  may  be  preferable. 

5.8  Survey-Based  SNA:  Motivations 

Though  an  SNA  based  on  electronic  communication  can  be  appealing  for  its  objectiveness,  a 
survey-based  SNA  provides  flexibility  in  the  design  of  the  questions  to  elicit  data  about  different 
kinds  of  relationships  e.g.  the  different  circumstances  that  prompt  different  kinds  of 
communications,  and  the  kinds  of  information  conveyed.  While  most  of  the  time  in  Polar 
Guardian’s  SNA  was  focused  on  getting  the  right  input  from  logs  on  electronic  communication, 
two  suggestions  were  kept  in  mind  for  the  follow-up  survey  phase  discussed  in  Section  1.2. 

•  If  possible,  engage  participants  with  SNA  graphs  and  analysis  results  from  an  email-based 
SNA  prior  to  conducting  the  survey. 

•  If  it  is  appropriate  for  the  culture  of  the  organization(s),  offer  reasonably  enticing  incentives 
for  returned  surveys.  It  has  been  found  that  gifts  work  well  for  this.  There  is  no  second 
chance,  and  missing  information  affects  the  patterns  in  the  social  network,  and  potentially 
the  conclusions  that  arise  from  its  analysis. 


40  From  the  perspective  of  policies/procedures  pertaining  to  security  and  email  administration.  As 
mentioned  for  intra-DND  email,  the  ethics  and  legality  also  need  investigation. 
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6  Conclusions  and  Recommendations 


Though  the  email-based  SNA  in  this  project  was  not  carried  out  to  completion,  the  preliminary 
investigations  under  Polar  Guardian  can  be  considered  a  success  in  establishing  a  baseline 
knowledge  about  what  is  involved  in  conducting  such  a  study.  Going  into  this  effort,  there  was 
very  little  information  about  what  could  be  expected  in  obtaining  and  preparing  data  for  such  an 
SNA  outside  of  a  completely  controlled  common  operating  environment  for  an  experiment. 
Technical  and  administrative/policy  requirements  and  challenges  were  identified,  as  were 
possible  approaches  to  their  solution.  Possible  trade-offs  of  the  approaches  were  identified  with 
respect  to  uncertainties  in  technical  feasibility,  acquisition  of  assets,  installation  approvals,  and 
delay.  Furthermore,  areas  were  identified  in  which  expert  consultation  and  investigation  are 
needed,  especially  concerning  solutions  implemented  onto  the  email  servers,  and 
challenges/unknowns  in  compiling  data  on  interagency  emails. 

The  overarching  lesson  that  can  be  taken  away  from  the  Polar  Guardian  effort  is  the  need  for  the 
active  involvement  of  an  SME  on  email  administration,  and  possibly  on  directory  services  for 
local  area  networks.  It  is  also  important  that  the  SME  have  visibility  into  the  DND  email  system 
e.g.  the  SME  would  have  had  involvement  with  DIMEI,  or  ideally,  would  come  from  DIMEI. 
Opportunistic  discussions  with  IT  staff  that  have  visibility  into  the  areas  of  email  administration 
and  directory  services  indicate  that  these  knowledge  areas  are  vast.  Obviously,  not  all  of  that 
knowledge  is  needed  to  obtain  SNA  data;  it  isn’t  clear,  however,  how  much  of  that  knowledge  is 
required.  Despite  the  technical  challenges  and  unknowns  reported  herein,  the  bridge  of  technical 
knowledge  and  interdepartmental  cooperation  that  would  be  enabled  by  SME  involvement  could 
invert  the  outlook  on  the  SNA’s  feasibility  from  one  containing  many  challenges,  unknowns,  and 
possible  outcomes  to  one  containing  a  few  (or  a  single)  simple  way(s)  forward,  with  minimal 
uncertainty  as  to  the  resources,  effort,  and  time  required,  and  a  clear  understanding  of  the  caveats 
associated  with  the  resulting  data. 

In  an  email-based  SNA,  there  is  a  serious  need  for  such  technical  expertise  for  all  aspects  of  data 
preparation,  including  the  assessment  and  implementation  of  the  three  broad  options:  (1)  direct 
use  of  partial  information  in  the  tracking  logs,  with  possible  assistance  from  directory  services, 
and  the  impact  of  irresolvable  identities  on  the  SNA;  (2)  commercial  tools  that  claim  to  be  able  to 
generate  the  required  data,  either  from  the  servers  or  from  tracking  logs,  with  possible  assistance 
from  directory  services;  and  (3)  the  ability  to  generate  the  required  data  using  email  server 
capabilities. 

Within  DND,  the  need  for  this  expertise  will  be  in  the  area  of  Microsoft  Exchange  Server®.  With 
respect  to  option  (1),  such  expertise  is  needed  to  assess  and/or  mature/complete  the  solutions  and 
trade-offs  described  in  this  study.  With  respect  to  options  (2)  and  (3),  such  expertise  is  needed  to 
fully  understand  the  need  for,  and  implications  of,  deploying  various  applications  onto  DWAN 
machines  and/or  email  servers,  and/or  access  to  directory  services.  This  expert  understanding  is 
essential  to  establishing  a  case  for  their  purchase  and/or  deployment,  if  necessary. 

Technical  expertise  in  email  administration  is  also  needed  to  scope  out  and  resolve  the  challenges 
in  obtaining  and  using  data  from  multiple  agencies.  Depending  on  discussion  with  email 
administrators  in  the  other  agencies,  and  the  expertise  that  can  be  leveraged  from  them,  it  is 
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possible  that  such  expertise  may  have  to  extend  beyond  Microsoft  Exchange  Server®  to  other 
email  servers,  and  to  IT  standards  pertaining  to  email  in  general. 

The  involvement  of  the  SME  cannot  be  superficial.  He/she  should  be  readily  accessible,  as 
needed,  and  take  an  active  role  in  the  solution  to  the  data  acquisition  and  preparation.  It  is 
recommended  that  the  SME  be  a  member  of  the  team  conducting  the  SNA,  at  least  for  the  initial 
stages  for  data  acquisition,  and  preferably  beyond. 

With  limitations  such  as  those  encountered  in  Polar  Guardian,  however,  the  time  frame  for  the 
SNA  needs  to  be  expanded  to  properly  assess  the  suitability  of  commercial  solutions  e.g.  through 
demonstrations  that  target  to  the  project's  needs,  or  (especially)  to  develop  solutions  to  the  user 
identification  problems  in  home-grown  solutions. 

With  respect  to  commercial  solutions,  an  assessment  is  also  needed  of  CFEC's  ability  to  meet  the 
prerequisites  of  those  applications  in  a  timely  manner,  such  as  availability  of  database  software, 
and  ready  access  to  directory  services  such  as  Active  Directory.  Delays  in  approval,  purchase, 
and/or  installation  need  to  be  taken  into  consideration. 

With  respect  to  home-grown  solutions,  it  may  be  the  case  that  within  the  project  limits  for  time  or 
resources,  not  enough  insight  can  be  gained  into  causes  of,  and  solutions  to,  the  user  identification 
problems,  or  that  the  solutions  are  not  feasible.  The  social  network  analyst  may  simply  have  to 
accept  that  a  portion  of  the  communications  will  not  be  reflected  in  the  SNA.  SME  input  could 
shed  light  on  how  adversely  the  SNA  would  be  impacted  by  this,  depending  on  how  systematic  is 
the  lost  data.  The  analysts  will  then  have  to  decide  how  important  are  such  inaccuracies, 
depending  on  the  purpose  of  the  SNA  and  the  circumstances  that  lead  to  lost  data. 

With  the  current  lack  of  subject  matter  expertise,  solutions  that  need  to  be  deployed  on  the  email 
servers  are  not  likely  to  be  technically  feasible  in  the  short  term.  The  administrative  delays  in 
their  approval,  and  the  risk  of  denial,  are  also  expected  to  be  significant. 

The  current  outlook  on  the  three  factors  above  (time  expenditures  to  assess  candidate  products, 
potential  inaccuracies  in  the  email  traffic  volumes  from  home-grown  solutions,  and  challenges  to 
server-hosted  solutions)  may  change  significantly  with  new  knowledge,  such  as  input  from  an 
SME.  It  is  possible  that  the  aforementioned  expansion  of  the  time  frame  in  particular  could  be 
made  more  concrete  with  expert  technical  input,  and/or  that  it  need  not  be  significantly  more  than 
initially  conceived  if  an  SME  is  involved  with  the  implementation  or  deployment. 

A  significant  lead-time  should  be  scheduled  for:  (1)  obtaining  approval  of  the  legality  and 
ethicality  of  accessing  the  data  on  email  communications;  (2)  obtaining  permission  for,  and 
arranging,  access  to  the  data;  and  (3)  identifying  the  users  to  be  studied.  Within  DND,  all  of  these 
seem  to  be  plausible,  if  not  quick.  For  interagency  communication,  the  feasibility  and  time 
frames  for  these  requirements  need  to  be  assessed  in  consultation  with  the  relevant  authorities  in 
each  organization.  It  is  possible  that  authorities  outside  of  the  participating  organizations  may 
also  need  to  be  consulted  regarding  (1).  With  respect  to  (1)  and  (2)  (as  opposed  to  the  previously 
mentioned  technical  approvals  for  deployment  of  applications),  subject  matter  expertise  in  email 
administration  and/or  directory  services  is  expected  to  be  important  in  establishing  the  case  for 
their  approval. 
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The  discussion  thus  far  has  focused  on  identifying  and  addressing  the  front-end  challenges  to 
corporate  SNA  based  on  email,  and  to  a  limited  degree,  interagency  SNA  based  on  email.  It 
would  be  fitting  to  also  discuss  what  SNA  can  do  for  DND/CF,  and  hence  why  such  additional 
lengths  are  worthwhile41.  Section  1.1  describes  the  extreme  situation  in  which  it  is  not  known 
how  closely  actual  communications  follow  formal  protocols,  in  the  context  of  interagency  SA. 
Within  the  military,  however,  there  can  be  a  wide  variation  in  the  level  of  detail  to  which  formal 
protocols  are  specified,  beyond  SA  to  command  and  control,  and  depending  such  factors  as  a 
particular  commander’s  personality  [10].  SNA  is  a  tool  to  reveal  actual  communications  patterns 
in  order  to  sanity  check  and  streamline  operations.  Analysts  are  not  left  relying  solely  on  patterns 
suggested  by  protocol,  which  themselves  might  be  rather  open-ended,  and  the  adherence  to  which 
can  only  be  assumed42.  Such  ground  truth  is  important  for  situations  requiring  timely  shared  SA, 
completeness  of  information,  and  timely  response. 

In  the  domain  of  joint,  multinational  and  interagency  experiments  and  exercises43,  SNA  has 
already  been  used  to  inform  concept  development44,  albeit  in  a  controlled  experimental 
communications  environment.  Extending  this  into  the  operational  domain  within  DND  and 
beyond  would  bring  insight  not  only  to  higher  fidelity  exercises  that  are  conducted  on  operational 
equipment/facilities,  but  also  to  operations  itself.  An  SNA  could  suggest  where  to  investigate 
further  in  troubleshooting  unexpectedly  untimely  response.  Operations  can  also  be  compared 
with  exercises  to  improve  the  fidelity  of  the  latter.  A  final  example  of  SNA’s  utility  is  to  inform 
the  development  of  protocols  for  a  liaison  officer  in  a  operations  centre  to  reach  back  to  his/her 
home  organization,  based  on  the  information  flows  therein. 

Both  exercises  and  operations  can  be  studied  to  validate  the  thinking  behind  concepts  of 
operation,  not  only  by  characterizing  specific  communications  pathways  of  initial  interest,  but 
also  by  revealing  unexpected  features.  One  example  is  to  identify  highly  connected  nodes  that 
may  be  critical  failure  points,  and  that  could  benefit  from  planned  redundancy  e.g.  in  staffing. 
Another  example  comes  from  a  recent  study  revealing  no  communication  flows  between  Marine 
Security  Operations  Centre  East  and  Government  of  Canada  Operations  Centre,  thus  indicating  a 
lack  of  formal  mechanisms  for  their  interaction  [ll]45. 


41  These  considerations  also  apply  to  the  more  complete  SNA  based  on  electronic  communications, 
including  phone  communication,  as  mentioned  in  Section  1.2. 

42  In  fact,  doctrine  in  particular  is  only  meant  to  represent  the  guiding  default  conduct  for  a  situation,  to  be 
overridden  whenever  an  alternative  is  deemed  more  appropriate. 

43  Becoming  increasingly  important  in  the  current  day  push  for  a  comprehensive  approach  to  CF  operations 
[12], 

44  For  MNE  4,  the  SNA  included  all  manner  of  electronic  interaction  [4]  [5].  SNA  was  also  intended  for 
MNE  5,  but  was  not  completed  due  to  complications  in  maintaining  the  resourcing  of  the  SNA  with  key 
personnel.  Chat  data  from  MNE  4  was  also  analyzed  outside  of  the  MNE  4  purview  [13],  in  conjunction 
with  email  data  from  an  interagency  Command  Post  Experiment  known  as  Pegasus  Guardian  [14]  [15]  [16]. 

45  The  communication  picture  in  the  latter  example  is  more  of  a  cross  between  protocol  and  reality,  since 
the  data  was  gathered  from  the  operators  and  documents  rather  than  based  on  actual  communications. 
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Annex  A  Overview  of  Pajek 


The  aim  of  this  annex  is  to  provide  a  quick  overview  of  Pajek,  a  freeware  application  that 
performs  social  network  analysis.  In  particular,  the  various  commands  of  Pajek  are  only  partially 
described  and  the  discussion  focuses  on  the  utility  of  Pajek.  The  readers  interested  in  learning 
how  to  use  the  program  are  referred  to  the  user  manual  which  is  available  online  at  the  following 
URL:  <http://vlado. fmf.uni-lj . si/pub/networks/paj ek/>. 

Pajek  is  a  program  for  the  analysis  and  visualization  of  large  networks.  This  program  was 
developed  by  Vladimir  Batagelj  and  Andrej  Mrvar  of  the  University  of  Ljubljana  in  Slovenia. 
Pajek  evolved  largely  since  its  first  version  came  out  in  November  1996.  The  program  is  a 
freeware  and  its  latest  version  (Pajek  1.14)  can  be  downloaded  at  the  following  URL: 
<http://vlado.fmf.uni-li.si/pub/networks/paiek/>.  Although  Pajek  can  be  used  for  representing 
chemical  molecules,  its  main  utility  is  for  social  network  analysis,  the  goal  of  which  is  to  identify 
and  interpret  pattern  of  social  ties  among  actors. 

From  a  SNA  perspective,  a  network  is  equivalent  to  a  mathematical  graph  -  composed  of  vertices 
and  links  between  some  of  these  vertices  -  with  additional  information  associated  to  the  vertices 
and  links  (e.g.,  vertices  label  and  type  of  link).  Pajek  deals  with  various  objects  that  can  be 
associated  with  a  network:  the  network  itself,  partitions  of  vertices,  permutation  of  vertices, 
clusters  of  vertices,  hierarchies  of  vertices,  and  numerical  vectors.  These  objects  are  defined  as 
follows: 

•  Network:  Set  of  vertices  and  links  between  vertices  with  possibly  additional  information 
associated  with  the  vertices  and  links. 

•  Partition:  Object  that  associates  each  vertex  with  a  given  class  of  vertices.  Partitions  may 
specify  structural  properties  or  attributes  of  vertices. 

•  Permutation:  A  set  of  rules  modifying  the  ranking  of  an  ordered  set  of  vertices. 

•  Cluster:  Subset  of  vertices.  A  given  class  of  vertices  constitute  a  cluster  but  the  reverse  is 
not  true.  In  other  words,  the  vertices  belonging  to  a  same  cluster  do  not  necessarily  belong 
to  the  same  class. 

•  Hierarchy:  Ordered  subsets  of  vertices. 

•  Numerical  vector:  Use  to  associate  a  set  of  numerical  properties  to  each  vertex. 

Pajek  offers  the  possibility  to  create,  modify,  transform,  and  visualize  all  these  type  of  objects.  It 
is  also  possible  to  write  a  program  that  would  display  the  dynamic  of  the  visualized  network. 
Finally,  Pajek  can  be  used  to  obtain  various  properties  and  statistics  on  the  created  objects. 

Figure  A-l  displays  Pajek’s  interface.  On  the  window  appears  drop  down  menus  to  access  all 
loaded  objects:  networks,  partitions,  permutations,  clusters,  hierarchies,  and  vectors.  All  the 
loaded  objects  can  be  transformed  and  analyzed  using  the  menu  available  at  the  top  of  the 
window. 
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Figure  A-l.  Pajek's  main  interface  window. 

Although  networks  can  be  created  manually  within  Pajek,  it  is  also  possible  to  create  a  network 
from  a  DOS  text  or  ASCII  file.  A  generic  input  file  will  have  the  following  structure: 

*  Vertices  5 


1  “Josh  ” 

0.15 

0.25 

0.5 

2  “Phil” 

0.35 

0.7 

0.5 

3  “Bill” 

0.55 

0.11 

0.5 

4  “Karl" 

0.75 

0.65 

0.5 

5  “Anna” 

0.9 

0.45 

0.5 

*Arcs 

1  5  0.8 

2  5  0.7 

3  5  0.3 

4  3  0.6 

5  4  0.8 

*Edges 

Two  different  types  of  links  can  be  entered  in  Pajek:  arcs  or  edges.  Arcs  are  directed  links  going 
from  one  vertex  toward  another  one.  Edges  are  undirected  links.  In  the  example  above,  the 
network  is  composed  of  5  vertices.  A  label  is  assigned  with  each  vertex  (e.g.,  “Josh”).  Three 
numbers  between  0  and  1  are  also  used  to  specify  the  3D-coordinate  of  each  vertex.  Below  the 
vertices  data  are  the  arcs  data.  Each  arc  is  described  by  first  specifying  its  starting  vertex,  then  its 
ending  vertex,  and  finally  an  attributed  weight.  Similar  data  would  be  used  to  specify  the  edges. 
Note  that  the  vertex  label,  vertex  coordinates  and  the  weight  of  the  links  are  all  optional  data. 
Pajek  possesses  algorithms  to  select  optimal  locations  for  the  nodes.  In  particular,  the  user  can 
select  the  coordinates  of  the  vertices  in  a  way  to  minimize  the  number  of  links  crossing  each  other 
or  to  impose  a  minimum  distance  between  unlinked  vertices  (it  is  largely  accepted  that  the  visual 
distance  between  vertices  should  be  inverse  to  the  number  of  links  between  them,  strongly  linked 
vertices  being  near  each  other). 
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Figure  A-2.  Network  as  displayed  by  Pajek. 


Figure  A-2  shows  the  graph  as  displayed  by  Pajek  after  optimizing  the  node  location.  The  weight 
associated  with  each  link  is  displayed  in  red.  Note  that  the  nodes  are  also  partitioned  according  to 
gender.  The  partition  is  displayed  by  the  color  associated  with  the  node:  yellow  for  female,  blue 
for  male. 

In  addition  to  displaying  the  partition  using  color  code,  Pajek  can  also  display  numerical 
properties  associated  with  the  vertices  by  varying  the  size  of  their  associated  node.  Figure  A-3 
displays  a  network  where  the  following  values  were  associated  with  each  node:  ‘Josh’  has  a  value 
of  10,  ‘Phil’  a  value  of  15,  ‘Bill’  a  value  of  20,  ‘Karl’  a  value  of  12,  and  ‘Anna’  a  value  of  20. 
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Pajek  also  offers  tools  for  the  visualization  of  large  networks.  It  is  possible  to  merge  all  the 
vertices  pertaining  to  the  same  partition  into  a  single  vertex.  This  way,  a  more  global  view  of  the 
network  is  obtained;  only  the  connections  between  the  different  partitions  is  displayed  (this 
operation  is  called  a  global  view).  Another  possibility  is  to  display  only  a  subset  of  the  network, 
deleting  all  other  vertices  and  links  outside  of  this  sub-network  (this  operation  is  called  a  local 
view).  It  is  also  possible  to  merge  these  two  operations:  all  the  vertices,  except  those  pertaining  to 
a  given  sub-network,  are  merged  into  aggregate  vertices  based  on  the  partition  to  which  they 
pertain  (this  operation  is  called  a  contextual  view). 

In  addition  to  visualization  tools,  Pajek  offers  various  tools  for  analyzing  networks.  In  particular, 
various  measures  of  centrality  of  the  vertices  can  be  computed  (refer  to  the  resources  in  the 
Bibliography  for  a  discussion  of  centrality  measures).  The  overall  structure  of  the  network  can 
also  be  analyzed.  For  example,  Pajek  can  find  important  links,  the  removal  of  which  would  break 
the  network  into  unconnected  parts.  Pajek  can  also  be  used  to  investigate  the  strong  components 
of  a  network.  In  the  example  above,  Anna,  Karl,  and  Bill  form  a  strong  component  because  it  is 
possible  from  any  vertex  of  this  sub-group  to  move  to  any  other  vertices  of  this  sub-group  while 
following  the  directed  arcs.  Finally,  Pajek  can  be  used  as  a  simulation  tool.  For  instance,  the 
spread  of  a  disease  or  of  information  among  a  group  of  people  represented  by  vertices  can  be 
simulated. 
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Annex  B  Formulae  for  Soft  Limited  Weighting  of  Multi¬ 
recipient  Emails 


One  need  only  consider  the  less  relevant  broadcast  emails  that  one  receives  to  realize  that  a  single 
email  sent  simultaneously  to  N=20  recipients  is  unlikely  to  be  worth  as  much  as  20  different 
emails  sent  individually  to  different  people,  at  least  in  terms  of  meaningful  communication  that  is 
indicative  of  a  close  relationships.  It  is  necessary  to  count  the  20-recipient  email  as  being  worth 
somewhat  less  than  20  individual  emails,  so  the  contribution  to  the  SNA  connection  to  each 
recipient  will  be  less  than  a  single  email  i.e.  a  fraction  of  one  email.  A  means  is  needed  to 
quickly  assign  such  a  attenuating  scale  factor,  or  weighting,  to  the  overall  number  of  20  recipients 
to  arrive  at  the  equivalent  number  of  single-recipient  emails.  This  will  commensurately  attenuate 
the  contribution  to  the  SNA  connection  to  each  recipient. 

This  annex  focuses  on  the  algebraic  form  of  formulae  for  soft  limited  weighting.  It  does  not 
cover  further  elaboration  on  the  motivation  for  such  weighting. 

Different  weighting  examples  can  be  devised  if  one  regards  emails  with  recipient  lists  longer  than 
some  threshold  (say  /Vo=30  people)  as  contributing  no  additional  information  about  relationships 
between  individuals.  Under  such  a  weighting,  the  value  /VTol  of  an  /V-rccipicnt  email  approaches 
the  equivalent  of  N0  single-recipient  emails  as  N  increases,  but  never  reaches  N0.  For  example, 
the  total  weight  Nj0t(N)  for  all  N recipients  can  defined  as  the  lesser  of  Njot(N)=N  and 
NTot(N)=No,  with  a  soft  transition  where  the  two  cross  over.  Formulas  can  be  readily  borrowed 
from  semiconductor  physics  e.g.,  Nj,JN)  =/V0/V7( /V0+/V)  46 ,  which  soft-limits  at  /V0=30  (Figure  B- 
1).  This  implies  a  per-recipient  weight  of  Njot(N)/N=No/(  No+N). 

A  more  flexible  soft  limiting  can  be  devised  based  on  logarithmic  Bode  plots  for  electronic 
circuits,  where  frequency  response  follow  straight  lines,  with  soft  transitions  as  they  cross  over. 
To  soft  limit  at  No,  it  can  be  shown  that  /VTol  ( N )  =  N()  -  logfc  (1  +  b  ' " " N  ) ,  where  b>  1  results  in 

soft  limiting  for  values  close  to  1,  and  sharp  limiting  for  large  values  e.g.  20  (Bode  plots  use  10 
by  default,  but  the  curves  tend  to  have  shaip  knees).  Figure  B-2  shows  a  variation  of  this 
(explained  below)  for  b=1.2.  The  analyst  can  experiment  with  b  until  he/she  obtains  a  curve  of 
diminishing  returns  that  is  felt  to  be  appropriate. 

Figure  B-2  incorporates  a  correction  factor  to  eliminate  a  slight  nonzero  offset  in  /VTol  at  N=  0. 

The  offset  results  from  the  fact  that  Nj0t(N)  only  approaches  the  bounding  lines  Njot=N  and 
/VTol=30  asymptotically.  In  Bode  plots,  the  offset  is  rendered  insignificant  by  the  large  b  value, 
but  a  correction  factor  should  be  considered  here  because  b  is  intended  to  be  adjustable.  The 
correction  consists  of  vertically  compressing  the  curve  toward  the  bounding  line  /V|  Ol=30  by  an 
amount  needed  to  have  Nj0^N)  cross  the  origin.  This  amount  is  determined  by  evaluating  the 
vertical  offset  at  /Vt01(/V)  at  N=  0.  In  formal  terms,  the  compression  consists  of  a  few  linear 
geometric  transformations:  A  downward  translation  so  that  A|ot=30  aligns  with  the  x-axis,  a 
vertical  compression  by  the  required  amount,  and  then  an  upward  translation  to  reverse  the 


46  For  sake  of  attribution,  this  formula  is  from  electron  mobility  and  velocity  saturation.  Awareness  of  such 
origins  is  not  necessary  to  appreciate  the  graphical  behaviour. 
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downward  translation.  The  result  is  7VTot  (N)  =  7V0[1  -  log(i+fiJV0)(l  +  bN°  N  )] ,  shown  in  Figure 


B-2.  For  simplicity,  the  following  discussion  uses  the  simpler  form 

^Tot  (-W)  =  N0  -  logfe(l  +  bN°~N) . 


Figure  B-l.  Example  of  sublinear47  function  Nj  ot  as  a  function  of  N  which  soft-limits  at 

N0=30:  NTot(N)=NoN/(N0+N ) 


N0=  30 

20 
H 
O 

10 
0 

0  25  50  75  100 

N 


Figure  B-2.  Example  sublinear  function  NTot  (N)  =  N0  [1  -  log  ([+1  2at0  )  (1  +  1 .2  0  N  )] , 

which  soft-limits  at  Nq=30. 


47  There  are  rigorous  ways  to  define  sublinearity,  but  here  it  refers  to  the  behaviour  of  a  single-input,  single¬ 
output  function  in  the  upper-right  quadrant  of  a  Cartesian  graph  (the  only  region  of  interest).  The  goal  of 
the  function  is  to  represent  diminishing  returns,  so  the  slope  is  always  positive,  and  always  decreases  as  the 
independent  variable  increases. 
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The  algebraic  generation  of  a  curve  of  diminishing  returns  that  asymptotically  approaches  a  limit 
can  be  generalized  very  intuitively,  in  case  the  analyst  wants  greater  flexibility  in  the  sharpness  of 
the  knee.  As  mentioned,  such  a  curve  asymptotically  approaches  the  bounding  lines  Njot=N  for 
N<Nn  ,  and  Njot=No  for  N>N0  .  The  following  discussion  assumes  that  the  regime  of  interest  is  the 
upper-right  quadrant  of  a  Cartesian  graph  (independent  and  dependent  variables  are  positive). 

A  function  f{N)  can  be  devised  to  approach  bounding  curves  g\(N)  and  g2(A()  by  composing  it  as 
f\K)=amp 4  {amp[gi(N)]+amp[g2(N)]} ,  where  amp  is  a  superlinear48  “amplification”  function.  In 
regimes  of  N  away  from  their  cross-over  point,  g,(/V)  and  g2(/V)  are  expected  to  differ  nontrivially, 
and  amp’s  superlinearity  ensures  that  amp[gi(N)]+amp[g2(N)]  is  dominated  by  one  of  the  two 
terms.  The  subsequent  application  of  amp'1  then  yields  an  f[N)  that  approximates  the  larger  of 
g\(N)  and  g2(A 0-  Since  amp\g\(N)]+amp\g2(N)]  are  additive,  however, /(TV)  will  always  be  above 
both  gi(A)  and  g2(/V).  It  is  straightforward  to  show  that  Nj0fN)=g i (N)+g2(N)-  /( /V)  will  be  below 
both  g\(N)  and  g2(/V),  and  asymptotically  approach  the  lower  of  the  two,  which  is  the  desired 
behaviour. 

For  the  Bode-based  curve  of  Figure  B-2,  /Vt<u(/V)  can  be  written  N0  +  N  -  \ogb(bN°  +  h  ) , 

which  shows  that  g](/V)=/V,  g2(/V)=/Vo=30,  amp(N)=bN,  and  anpi1  (N)=loghN.  Figure  B-3  shows 
another  example  with  amp(N)=N  3.  Figure  B-l  is  of  a  different  algebraic  form,  but  is  well 
approximated  by  amp(N)=Nl  J  ;  similar  numerical  dynamics  prevail  in  that  the  limiting  boundaries 
are  transformed  into  another  domain  (via  arithmetic  reciprocation,  as  can  be  easily  checked),  in 
which  one  term  dominates  in  an  addition,  before  being  transformed  back  to  an  approximation  of 
the  dominant  boundary. 
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Figure  B-3.  Example  sublinear  function  NTot  ( N )  =  N0  +  N  —  [  A03  +  TV3  ]1/3 , 

which  soft-limits  at  Nq=30. 


4S  Here,  superlinear  refers  to  the  behaviour  of  a  single-input,  single-output  function  in  the  upper-right 
quadrant  of  the  Cartesian  graph.  The  slope  is  always  positive,  and  always  increases  as  the  independent 
variable  increases. 
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Choosing  a  power  function  amp(N)=NM  (Figure  B-3)  is  algebraically  simpler  than  the  exponential 
function  of  Bode-based  curves  because  amp(0)=0,  thus  ensuring  that  there  is  no  offset  at  jV=0, 
and  no  correction  factor  is  needed. 

While  there  is  some  judgement  on  how  to  shape  the  curve  with  which  to  weight  a  multi-recipient 
email,  the  weighting  should  be  consistent  with  a  single-recipient  email.  The  above  monotonic 
total  weightings  closely  approach  1  as  N—> -1,  and  0  as  A— >0. 

For  an  ^-recipient  email,  once  the  curve  for  the  total  weighting  Ar0t  is  determined,  the  per- 
recipient  contribution  to  the  SNA  connection  is  simply  Njot/N  (Figure  B-4). 
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Figure  B-4.  Per-recipient  contribution  to  SNA  connections 
for  the  N-recipient  email  of  Figure  B-3. 
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Annex  C  Interagency  Stake  Holders 


The  following  stake  holders  were  culled  from  the  minutes  for  ASIWG  16-17  May  2006. 

C.1  Federal 

Canada  Border  Services  Agency 
Canadian  Coast  Guard 
Canadian  Ice  Service 
Canadian  Security  Intelligence  Service 
Canadian  Space  Agency 
Citizenship  and  Immigration  Canada 
Department  of  National  Defence: 

CFEC,  Canada  Command,  Strategic  Joint  Staff,  DRDC,  JTFN 

Department  of  Fisheries  and  Oceans 

Environment  Canada 

Foreign  Affairs  and  International  Trade 

Health  Canada 

Indian  and  Northern  Affairs  Canada 

Northwest  Territories  (NT),  Nunavut  (NU),  Yukon  (YT) 

Industry  Canada 

International  Polar  Year  Federal  Program  Office 
Justice  Canada 

Northwest  Territories  Federal  Council 
National  Energy  Board 
Natural  Resources  Canada 
Public  Safety  and  Preparedness  Canada 
Parks  Canada 
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Public  Health  Agency  of  Canada 

Royal  Canadian  Mounted  Police: 

Northwest  Region  Immigration  &  Passport  Section 
Divisions:  G  (NT),  M  (YT),  V  (NU) 

Service  Canada 

Transport  Canada 

Yukon  Federal  Council 

C.2  Provincial 

Government  of  Northwest  Territories: 

Emergency  Measures  Organization 
Intergovernmental  Relations 
Justice 

Government  of  Nunavut: 

EMO  (Nunavut  Emergency  Management) 
Intergovernmental  Affairs 

Government  of  Yukon: 

Emergency  Measures  Organization 
Intergovernmental  Relations 
Justice 


C.3  Ethnic 

Nunavut  Tunngavik  Inc. 

C.4  Municipal 

Town  of  Churchill 

C.5  Universities  (Political  Science  Professors  From): 

University  of  Calgary 
University  of  Toronto 
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Annex  D  Data  Acquisition  Software 


The  following  is  an  account  of  the  commercial  software  approaches  investigated  circa  Sept  2006. 
The  accuracy  of  information  on  commercial  products  is  limited  to  the  accuracy  with  which  the 
information  was  provided  in  consultations  with  the  vendors. 

D.1  Importing  Logs  Into  Database  for  Querying 

An  early  approach  considered  was  to  import  the  log  data  into  Excel®  and  convert  it  into  an 
Access  database.  Both  packages  are  available  on  DRENET,  thus  avoiding  the  administration  of 
obtaining  and  installing  non-standard  software.  The  database  could  then  be  queried  for  the 
volume  of  email  traffic  between  pairs  of  relevant  mailboxes.  Due  to  the  following  factors,  this 
approach  was  not  pursued. 

1.  Only  65,536  records  can  be  imported  into  an  Excel®  spreadsheet.  At  the  time,  it  was  not 
known  that  the  user  base  and  log  files  were  broken  down  on  a  per-server  basis.  Afterward, 
however,  it  was  found  that  the  sample  log  file  from  a  just  a  single  server  contained  357,499 
physical  lines,  thus  rendering  Excel®  unusable  as  an  intermediate  application. 

2.  The  Exchange  Server®  5.5  log  files  were  later  found  to  be  variable  length,  in  terms  of  the 
number  of  fields  per  record,  with  some  records  consuming  multiple  physical  lines.  Basically, 
the  information  for  each  sender  and  recipient  for  an  email  occupies  its  own  physical  line, 
while  an  empty  physical  line  terminates  the  record  for  the  email  as  a  whole  (Annex  El).  Since 
this  data  structure  does  not  correspond  to  a  table,  it  begs  the  question  of  whether  it  makes 
sense  to  import  the  raw  log  data  into  a  spreadsheet,  or  even  a  database.  Instead,  what  is 
needed  is  a  data  preconditioning  phase  that  replaces  each  log  entry  for  a  one-to-many  email 
with  several  artificial  one-to-one  emails  (along  with  attenuation  weights,  as  per  Section  3.3). 
Unfortunately,  this  increases  the  record  count,  the  maximum  of  which  is  already  well 
exceeded. 

3.  Aside  from  questionability  of  treating  the  raw  data  as  table  data,  the  number  of  records  was 
too  large  for  a  spreadsheet  even  after  accounting  for  the  fact  that  one  record  consumes 
multiple  physical  lines.  The  above  sample  log  file  for  just  a  single  server  contained  103,663 
multi-line  records. 

4.  At  the  time  that  Excel®  was  being  explored,  the  format  of  the  log  files  was  not  known.  It 
would  have  been  discovered  upon  later  examination,  however,  that  determining  the 
sender/recipient  IDs  would  require  further  nontrivial  pre-processing. 
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D.2  Use  of  Excel®’s  PivotTable®  in  Multi-National  Experiment 
(MNE)  4 


Discussion  was  held  with  a  social  network  analyst  for  MNE  4  to  glean  lessons  learned49, 
particularly  regarding  methodology  and  tools.  The  MNE  4  SNA  differed  considerably  in  that  it 
examined  a  variety  of  communications  types.  The  source  data  involved  records  of 
communications  from  CFBLNet.  Being  an  experiment-specific  network,  it  stood  up  for  the  finite 
duration  of  an  experiment  and  has  a  small  overall  user  base.  For  MNE  4,  there  were 
approximately  800  participants.  The  source  data  provided  for  the  SNA  encompassed 
communications  between  200  users.  It  still  involved  a  large  volume  of  data,  however,  as  well  as 
extensive  manual  preparation  so  that  the  data  could  be  read  into  Excel®.  The  PivotTable® 
feature  [11]  could  then  generate  input  suitable  for  SNA  software,  which  for  MNE  4  was 
UCINET,  Pajek,  and  Cyram’s  NetMiner. 

Even  though  Polar  Guardian  would  examine  email  spanning  a  greater  number  of  topics,  it  is 
possible  Excel®  ’s  PivotTable®  could  have  been  used  after  the  voluminous  logs  were  filtered  to 
leave  only  records  of  communication  between  mailboxes  of  interest.  It  seemed  more  direct, 
however,  to  compile  the  traffic  stats  using  the  same  scripting  step  as  that  used  to  filter  the  logs. 
Such  an  approach  also  eliminates  the  risk  due  to  Excel®  limitations  in  record  count.  As  in  the 
database  approach,  however,  the  nontrivial  issue  of  determining  the  sender/recipient  IDs  would 
still  need  addressing  regardless  of  whether  PivotTable®  or  scripted  compilation  was  used  (this 
wasn’t  apparent  at  the  time  when  PivotTable®  was  being  considered). 

D.3  Quest®’s  MessageStats™ 

On  several  occasions,  DIMEI  suggested  using  the  commercial  package  MessageStats™  from 
Quest®  to  generate  reports  containing  the  input  data  for  Pajek  (Section  3.4,  item  3).  They  were 
also  quite  interested  in  MessageStats™  as  a  potential  tool  for  their  own  use  in  the  long  term.  A 
number  of  factors,  however,  suggested  that  this  was  not  the  tool  to  use  for  the  SNA  due  to  the 
short  time  frame  and  limited  financial  resources. 

1.  Discussion  with  Quest®  indicates  that  MessageStats™  needs  to  query  the  Active  Directory 
(AD)  on  the  DWAN  to  identify  the  sender/recipient.  They  are  of  the  position  that  the  log 
files  do  not  contain  these  details. 

2.  Despite  conversations  with  several  representatives  at  Quest®,  it  was  unclear  whether  an 
output  report  would  be  close  in  format  to  that  required  by  Pajek.  Confirmation  was  to  be 
provided  some  time  after  18  Sept  2006,  and  it  was  still  forthcoming  when  notice  about  the 
cessation  of  project  activities  was  provided  to  them  on  12  Oct  2006. 

3.  The  initial  costing  scheme  yielded  an  unrealistic  cost.  The  initial  cost  was  $7.50  for  each  and 
every  mailbox  in  the  domain  of  interest.  As  a  special  consideration,  this  was  then  reduced  to 
all  mailboxes  on  all  servers  of  interest.  This  still  results  in  excessive  cost  for  the  final 
interagency  SNA,  for  which  60  servers  was  assumed,  each  serving  1000  mailboxes;  60,000 
mailboxes  would  cost  $450,000.  According  to  DIMEI,  MessageStats™’s  kind  of 
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functionality  should  cost  in  the  order  several  tens  of  thousands  of  dollars,  with  the  absolute 
upper  limit  approaching  $  1 00K.  The  only  way  for  the  initial  cost  from  Quest®  to  fall  within 
this  ballpark  is  if  the  mailbox  count  was  over-estimated  by  an  order  of  magnitude.  Quest® 
committed  to  provide  more  tenable  pricing  meant  for  corporate  level  purchases  some  time 
after  18  Sept  2006;  it  was  forthcoming  as  of  12  Oct  2006. 

4.  MessageStats™  requires  the  use  of  SQL  Server,  enterprise  version  (version  2000  preferred). 
An  inquiry  was  submitted  on  18  Sept  2006  to  SEAMS  about  its  availability.  As  of  25  Oct 
2006,  SEAMS’s  plans  for  providing  database  services  consisted  of  intentions  to  conduct  a 
study  of  CFEC’s  needs.  For  the  SNA  in  the  immediate  term,  it  was  not  clear  from  Quest® 
whether  MSDE  could  be  used  (free,  lighter  version  of  SQL  Server);  it  was  not  recommended 
for  further  exploration,  however,  due  to  the  large  volume  of  data  expected.  The  size  of  the 
database  created  by  MessageStats™  depends  primarily  on  the  number  of  mailboxes  hosted  by 
the  servers  whose  log  files  are  processed  rather  than  the  number  of  mailboxes  of  interest.  It 
was  not  clear,  however,  how  large  the  database  would  be. 

Due  to  DIMEI’s  recommendation  of  this  application,  EXORT  decided  to  reconsider  its  use  when 
further  information  was  provided,  especially  with  respect  to  cost  and  confirmation  of  functional 
suitability.  This  includes  the  following,  many  of  which  were  forthcoming  as  of  12  Oct  2006. 

1 .  Confirmation  of  suitable  output  format. 

2.  Confirmation  that  generating  N  metrics  of  traffic  volume  for  N  mailbox  pairs  of  interest  will 
not  require  N  queries. 

3.  Confirmation  that  proper  compilation  of  data  can  be  done  when  the  log  files  come  from 
servers  on  different  domains. 

4.  An  idea  of  the  size  of  the  database  generated,  and  resources  required. 

5.  Demo  of  its  capabilities,  which  had  been  proposed  by  both  DIMEI  and  Quest®  a  number  of 
times.  Quest®  has  tied  the  demo  to  DIMEI’s  continuing  involvement  and  dialogue,  despite 
DIMEI’s  request  that  the  demo  be  driven  by  the  SNA’s  requirements.  EXORT  also  requested 
of  Quest®  that,  for  the  demo,  the  interest  arising  from  the  immediate  needs  of  Polar  Guardian 
be  decoupled  from  DIMEI’ s  interest  in  meeting  future  corporate  needs.  DIMEI  has  grown 
silent  on  the  matter  of  the  demo.  Considering  the  cessation  of  Polar  Guardian  and  the  lack  of 
information  about  functional  suitability  and  cost  effectiveness,  the  MessageStats™  demo  was 
not  pursued. 

MySQL  was  mentioned  in  in-house  discussions  about  the  MessageStats™’s  requirements  for  an 
SQL  database.  If  and  when  MessageStats™  started  to  look  like  a  viable  alternative,  MySQL’s 
suitability  could  be  explored  further. 

D.4  PROMODAG™  Reports 

PROMODAG™  Reports  requires  access  to  the  DWAN’s  Active  Directory  (AD)  to  identify  the 
users  in  the  tracking  logs.  It  also  requires  SQL  Server.  It  is  not  clear  how  it  handles  servers  from 
different  organizations.  PROMODAG™  is  another  vendor  whose  position  is  that  the  tracking 
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logs  do  not  contain  sufficient  information  with  which  users  can  be  identified,  and  that  what  data  is 
present  takes  varying  forms. 


D.5  Waterford  Technologies’  MailMeter  Insight 

There  was  no  response  from  Waterford  Technologies  about  its  suitability  for  the  SNA’s 
requirements. 

D.6  Symantec™’s  BindView™ 

According  to  a  representative,  this  product  does  not  satisfy  the  SNA’s  functional  requirements. 
He  suggested  a  scripted  data  processing  approach,  such  as  VB  or  Perl.  Since  BindView™  was 
recently  acquired  by  Symantec™,  it  could  be  confusing  to  establish  a  POC  for  product 
information. 


D.7  Morphix’s  MetaSight® 

MetaSight  is  an  SNA  analysis  package  that  requires  no  special  pre-processing  of  log  files,  since  it 
polls  the  servers  directly.  This  manner  of  operation  makes  it  unsuitable  for  Polar  Guardian’s 
SNA,  since  EXORT  only  have  access  to  the  servers’  log  files.  The  costs  are  also  quite  high. 
Simply  trying  it  out  for  3  months  costs  $50K,  while  full  usage  costs  $180K/year,  and  additional 
$50K/server  outside  of  the  host  organization  (DND).  SQL  Server  and  IIS  (a  Microsoft™  web 
server)  are  also  required. 

D.8  Orgnet’s  InFlow 

InFlow  performs  SNA  analysis.  It  does  not  read  email  log  files  to  compile  input  data.  Hence,  it 
fits  the  same  part  of  the  analysis  chain  as  Pajek  and  requires  the  same  pre-processing  being 
sought. 
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Annex  E  Perl  References 
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Annex  F  Exchange  Server®  5.5  Tracking  Log  and 
Events 


The  following  tracking  log  description  was  obtained  from  Microsoft™  support,  and  is  accessible 
online  [17].  Study  of  example  logs  indicated  some  errors  in  these  specifications.  For  example, 
field  1 1  (cost)  rarely  has  value  1,  though  apparently,  it  should  always  be  1.  The  number  of 
recipients  is  in  field  13  rather  than  12.  Each  recipient  name  is  not  tab  delimited  as  implied  (it  is 
preceded  by  a  “newline”  for  each  recipient). 

F.1  Tracking  Log 

The  tracking  log  is  stored  in  Exchsrvr\tracking.log.  Each  day,  a  new  log  is  created  that  records 
one  day's  activities  on  the  server.  Each  daily  log  is  named  by  the  date  on  which  it  was  created,  in 
yyyymmdd.log  format.  The  file  name  date,  like  all  time  in  the  tracking  log,  is  in  EJTC. 

The  log  can  be  displayed  in  any  text  editor,  imported  into  spreadsheets  such  as  Microsoft  Excel, 
or  used  as  input  data  to  custom  applications. 

Activities  recorded  in  the  tracking  log  often  include  a  message  ID,  which  is  a  unique  message 
identifier.  By  searching  the  tracking  log  for  the  message  ID,  you  can  follow  the  message  as  it  is 
handled  and  transported  within  the  site. 

The  Microsoft  Exchange  Server  Administrator  program  includes  an  automated  message  tracking 
process.  The  Track  Message  command  traces  messages  through  all  existing  logs  in  the  network. 
You  can  use  this  process  instead  of  attempting  a  manual  search  of  the  logs. 

Interpreting  Tracking  Log  Fields 

The  following  table  describes  the  tab-separated  columns  in  the  tracking  logs. 


Field 

# 

Field  Name 

Description 

1 

Message  ID 
or  MTS-ID 

Message  ID  is  a  unique  identifier  assigned  to  the 
message  by  Microsoft  Exchange  Server.  It  stays 
with  the  message  from  its  origination  to  delivery 
or  transfer  from  the  network. 

Messages  from  foreign  systems  include  a 
message  transfer  system-ID  (MTS-ID)  that 
uniquely  identifies  the  component  that 
transported  the  message. 

2 

Event  # 

Represents  the  event  type.  For  event  details,  see 
"Interpreting  Events"  later  in  this  chapter. 
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3 

Date/Time 

Date  and  time  of  the  event  UTC. 

4 

Gateway 

name 

Name  of  the  gateway  or  connector  that  generated 
the  event.  If  no  gateway  was  involved,  the  field  is 
blank. 

5 

Partner  name 

Name  of  the  messaging  service  associated  with 
the  event.  In  Microsoft  Exchange  Server,  the 
partner  is  the  MTA  or  the  information  store. 

6 

Remote  ID 

Message  ID  used  by  the  gateway. 

7 

Originator 

Distinguished  name  of  the  originating  mailbox,  if 
known. 

8 

Priority 

Priority  set  by  the  sender. 

0  =  Normal 

1=  High 

-1  =  Low 

9 

Length 

Message  length  in  bytes. 

10 

Seconds 

Transport  time  in  seconds. 

Not  used  by  Microsoft  Exchange  Server.  The 
value  in  this  field  is  0  or  blank. 

11 

Cost 

Cost  per  second  for  message  transfer. 

Not  used  by  Microsoft  Exchange  Server.  The 
value  in  this  field  is  always  1 . 

12 

Recipients 

Number  of  recipients. 

13 

Recipient 

name 

Distinguished  name  of  the  recipient  of  the 
message  or  a  proxy  address. 

This  field  is  separated  from  the  previous  field  by 
a  line  feed.  This  field  is  repeated  for  each 
recipient. 

14 

Recipient 
report  status 

A  number  representing  the  result  of  an  attempt  to 
deliver  a  report  to  the  recipient. 

Delivered  =  0 
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Not  delivered  =  1 

This  is  used  only  for  reports.  On  other  events,  it 
is  blank.  This  field  is  repeated  for  each  recipient. 


F.2  Interpreting  Events 

The  following  table  defines  event  numbers  that  appear  in  tracking  logs. 


Event  # 

Event  Type 

Description 

0 

Message 
transfer  in 

The  MTA  completed  transfer  of 
responsibility  for  a  message  from  a 
gateway,  X.400  link,  or  MTA  into  the 
local  MTA. 

1 

Probe  transfer 
in 

The  MTA  completed  transfer  of 
responsibility  for  a  probe  from  a 
gateway,  X.400  link,  or  MTA  into  the 
local  MTA. 

2 

Report  transfer 
in 

The  MTA  completed  transfer  of 
responsibility  for  a  report  from  a 
gateway,  X.400  link,  or  MTA  into  the 
local  MTA. 

4 

Message 

submission 

A  message  was  submitted  by  a  local 
e-mail  client  (usually  through  the 
information  store). 

5 

Probe 

submission 

An  X.400  probe  was  submitted  by  a 
local  e-mail  client  (usually  through 
the  information  store). 

6 

Probe  transfer 

out 

The  MTA  completed  transfer  of 
responsibility  for  a  probe  from  the 
local  MTA  to  a  gateway,  X.400  link, 
or  another  MTA. 

7 

Message 
transfer  out 

The  MTA  completed  transfer  of 
responsibility  for  a  message  from  the 
local  MTA  to  a  gateway,  X.400  link, 
or  another  MTA. 

8 

Report  transfer 
out 

The  MTA  completed  transfer  of 
responsibility  for  a  report  from  the 
local  MTA  to  a  gateway,  X.400  link, 
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or  another  MTA. 

9 

Message 

delivered 

The  MTA  completed  delivery  of  a 
message  to  local  recipients  (usually 
through  the  information  store). 

10 

Report 

delivered 

The  MTA  completed  delivery  of  a 
receipt  or  NDR  to  local  recipients 
(usually  through  the  information 
store). 

26 

Distribution  list 
expansion 

The  MTA  has  expanded  a 
distribution  list  to  produce  a  new 
message  that  has  recipients  who  are 
distribution  list  members. 

28 

Message 

redirected 

The  MTA  has  redirected  a  message 
or  probe  to  an  alternate  recipient 
because  of  incorrect  configuration 
data  for  the  original  recipient,  or 
failure  to  route  the  object  or 
reassignment  of  data  contained  in  the 
message. 

29 

Message 

rerouted 

The  MTA  has  rerouted  a  message, 
report,  or  probe  because  of  problems 
with  next  route  X.400  link  or  MTA. 

31 

Downgrading 

The  MTA  has  mapped  a  message, 
report,  or  probe  into  the  1984  X.400 
protocol  before  transferring  it  to  a 
remote  1984  MTA. 

33 

Report 

absoiption 

The  MTA  has  scheduled  a  report  for 
deletion  because  the  user  did  not 
request  it.  In  X.400  protocol,  NDRs 
are  always  routed  back  to  the  sending 
MTA  even  if  the  user  did  not  request 
a  report. 

34 

Report 

generation 

The  MTA  has  created  a  delivery 
receipt  or  NDR. 

43 

Unroutable 

report 

discarded 

The  MTA  has  discarded  a  report 
because  the  report  cannot  be  routed 
to  its  destination. 

50 

Gateway 

deleted 

message 

The  administrator  deleted  an  X.400 
message  that  was  queued  by  the 

MTA  for  transfer  to  a  gateway.  No 
delivery  report  is  generated. 
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51 

Gateway 
deleted  probe 

The  administrator  deleted  an  X.400 
probe  that  was  queued  by  the  MTA 
for  transfer  to  a  gateway.  No  delivery 
report  is  generated. 

52 

Gateway 
deleted  report 

The  administrator  deleted  an  X.400 
report  that  was  queued  by  the  MTA 
for  transfer  to  a  gateway.  No  delivery 
report  is  generated. 

1000 

Local  Delivery 

The  sender  and  recipient  are  on  the 
same  server. 

1001 

Backbone 
transfer  in 

Mail  was  received  from  another 
Messaging  Application  Programming 
Interface  (MAPI)  system  across  a 
connector  or  gateway. 

1002 

Backbone 
transfer  out 

Mail  was  sent  to  another  MAPI 
system  across  a  connector  or 
gateway. 

1003 

Gateway 
transfer  out 

The  message  was  sent  through  a 
gateway. 

1004 

Gateway 
transfer  in 

The  message  was  received  from  a 
gateway. 

1005 

Gateway  report 
transfer  in 

A  delivery  receipt  or  NDR  was 
received  from  a  gateway. 

1006 

Gateway  report 
transfer  out 

A  delivery  receipt  or  NDR  was  sent 
through  a  gateway. 

1007 

Gateway  report 
generation 

A  gateway  generated  an  NDR  for  a 
message. 

1010 

SMTP  Queued 
Outbound 

Outbound  mail  was  queued  for 
delivery  by  the  Internet  Mail  Service. 

1011 

SMTP 

Transferred 

Outbound 

Outbound  mail  was  transferred  to  an 
Internet  recipient. 

1012 

SMTP 

Received 

Inbound 

Inbound  mail  was  received  from  by 
the  Internet  Mail  Service. 

1013 

SMTP 

Transferred 

Inbound 

Mail  received  by  the  Internet  Mail 
Service  was  transferred  to  the 
Information  Store. 
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1014 

SMTP  Message 
Rerouted 

An  Internet  message  is  being  rerouted 
or  forwarded  to  the  proper  location. 

1015 

SMTP  Report 
Transferred  In 

A  delivery  receipt  or  NDR  was 
received  by  the  Internet  Mail  Service. 

1016 

SMTP  Report 
Transferred 

Out 

A  delivery  receipt  or  NDR  was  sent 
to  the  Internet  Mail  Service. 

1017 

SMTP  Report 
Generated 

A  delivery  receipt  or  NDR  was 
created. 

1018 

SMTP  Report 
Absorbed 

The  receipt  or  NDR  could  not  be 
delivered. 
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Annex  G  Exchange  Server®  2003  Tracking  Log  and 
Reference  to  Events 


The  following  tracking  log  description  was  obtained  from  DIMEI.  Parallel  information  for 
Exchange  2000  Server  is  available  online  [18],  and  it  appears  to  be  identical  in  content  to  that  for 
Exchange  Server®  2003.  Information  on  Event  IDs  for  Exchange  Server®  2003  [19]  became 
available  during  the  final  revisions  of  this  note. 

Exchange  2003  Tracking  Logs  Fields 

The  tracking  log  file  is  stamped  with  the  following  information  at  the  very  start  of  the  file. 

#  Message  Tracking  Log  File 

#  Exchange  System  Attendant  Version  6.5.xxxx 

The  following  is  a  list  of  all  the  information  available  in  columnar  form  in  the  tracking  log  file: 


Field  number 

Field  name 

Description 

1 

Date 

Date  of  the  event. 

2 

Time 

Greenwich  mean  time  of  the  event. 

3 

Client-IP 

IP  of  connecting  client. 

4 

Client- 

hostname 

Hostname  of  connecting  client. 

5 

Partner-name 

Name  of  the  messaging  service  that  the  message  is 
handed  off  to.  In  Exchange  2000,  the  service  can  be: 
SMTP,  X400,  MAPI,  IMAP4,  POP3,  STORE.  This 
is  essentially  the  same  as  Exchange  Server  5.5,  but 
in  Exchange  2000,  there  are  more  possibilities  for 
this  field. 

6 

Server- 

hostname 

Hostname  of  the  server  that  is  making  the  log  entry. 

7 

Server-IP 

IP  of  the  server  that  is  making  the  log  entry. 

8 

Recipient- 

address 

Message  recipient  (SMTP  or  X.400  address). 

9 

Event-ID 

Integer  corresponding  to  the  Event  ID  of  the  action 
logged,  for  example:  sent,  received,  delete,  retrieve. 

10 

MSGID 

Message  ID. 

11 

Priority 

The  priority  is  represented  by  - 1  if  low,  0  if  normal, 

1  if  high 

12 

Recipient- 

Report-Status 

A  number  representing  the  result  of  an  attempt  to 
deliver  a  report  to  the  recipient:  0  if  delivered,  1  if 
not  delivered.  This  is  used  only  for  reports  (non¬ 
delivery  reports  [NDRs],  delivery  receipts  [DRs]). 

On  other  events,  it  is  blank. 

13 

Total-bytes 

Message  size  (in  bytes). 
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14 

Number- 

recipients 

Total  number  of  recipients. 

15 

Time-taken 

Delivery  time  (in  seconds)  representing  the  time  it 
takes  to  deliver  the  message.  Determined  from  the 
difference  between  the  timestamp  and  time  encoded 
in  Message  ID.  Only  valid  for  messages  within  the 
Exchange  organization  (all  versions);  there  is  no 
requirement  to  decode  other  product  message  IDs 
such  as  Sendmail,  and  so  on. 

16 

Encryption 

For  the  primary  body  part:  0  if  no  encryption,  1  if 
signed  only,  2  if  encrypted.  This  is  per  message,  not 
per  recipient. 

17 

Service-version 

Version  of  the  service  making  the  log  entry. 

18 

Linked- 

If  there  is  a  MSG  ID  from  another  service,  it  is 

MSGID 

given  here  to  link  the  message  across  services. 

19 

Message- 

subject 

The  subject  of  the  message,  truncated  to  256  bytes. 

20 

Sender-address 

Primary  address  of  the  originating  mailbox,  if 
known.  This  could  be  SMTP,  X.400,  or 

Distinguished  Name  (DN),  depending  on  transport. 
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Annex  H  Exchange  Server®  5.5  Tracking  Log  Example 
Records 


The  following  are  selected  records  from  the  Exchange  Server®  5.5  example  tracking  log 
illustrating  some  of  the  harder  to  use  identification  data.  Delimiting  tabs  are  shown  as  solid  right- 
arrowheads.  “Newlines”  are  shown  as  scripted  backward  “P”.  Sender  identity  data  is  underlined. 
Data  for  each  recipient  occupies  its  own  line,  immediately  following  the  originator  line. 
Inconsistent  use  of  commas  and  spacing  is  preserved,  as  is  upper/lower  case.  Annex  F  notes 
discrepancies  between  the  tracking  log  data  and  the  specifications. 

Anonymization  measures  are  as  follows.  Square  brackets  show  optional  content. 

•  Actual  names  have  been  replaced  by  generic  names,  though  dashes  and  “O’  ”  are  retained 

•  Some  alphanumeric  field  values  have  been  replaced  with  italicized  example  strings,  each  of 
which  are  optionally  appended  with  integers  e.g.  EXAMPLESTRINGn,  EXSTRn,  XSTRn, 
SOMECITYn,  SOMETOWN,  ORGn,  somenetl,  etc. 

•  Some  decimal  digits  have  been  replaced  by  D 

•  Four-digit  years  have  been  replaced  with  YYYY,  month/day  decimal  digits  have  been 
replaced  by  D 

•  Times  have  been  replaced  by  H[H] :M[M] :S[S] ,  where  the  number  of  digits  have  been 
preserved. 

C=CA;A =ORG4.  CO  UNTR  Y;P=OR  G3;L=EXAMPLESTR1NG5  ►  0  ►  YYYY.D.DD 

H.M.SS^-  ►/0=0i?G7.0i?G2/0U=A5Ti?7/CN=C0NFIGURATI0N/CN=SERVERS/CN=FX4Af 

PLES  T R ING 1  /C  N =SOM. E  CORE  1 

MTA^  ^C=CA:A=ORG4.COUNTRY:P=ORG3:DDA:SMTP=EmailUserNamel(a)somenetl.ors 

Tca^O  ►  8214^-0  ►O  ►►  31 

/o =ORGl .  ORG2/ou=XSTR  I /cn=Rccipicnts/cn=+XMPLSTRING 
/o =ORGl . OR G2/o\x=XS T/N  /'c  n = R  c  c  i  p  i  c  n  ts/cn =L4.S TNA ME  1 ,  FIRSTNAME1  3891 
/o =ORGl . ORG2/ou=XSTRl/cn=Recipientslcn=LASTNAME2,  FIRSTNAME2  0971 
1 

C=CA;A =ORG4.  COUNTR  Y;P=ORGl .  ORG2;L=EXAMPLESTRING2  ►  4  ►  YYYY.D.DD 
H:  MM:  SS  ►  ►  10= OR  G 1  .OR  G2/0  U  =XS  TR1  /C  N =CO  N  F I G  U  R  A  T 1 0  N/  C  N = S  E  R  V  E  R  S/C  N =EXA 
M PLES  TR ING4/C  N =SOME  COR  P 1  PRIVATE 

MDB  ►  ►  / o=ORGl . ORG2/ou=XS 7R7/cn=RECIPIE NT S/ cn=LASTNA 3 IE 3.  FIRSTNAME3 
@931  ►O^  1550^0^0^  ►  111 

C=CA;A =ORG4.  COUNTRY;P=ORGl .  0RG2;0=XSTRP,DDA:SMTP=EmailUserNcmie2(a)exa.m 
plestring26.com;l 

1 

c=CA;a=ORG4.  CO  UNTR  Y;p=OR  G1 .  OR  G2\\=EXAMPLE STRING  3  ►  1 000  ►  YYYY.D.DD 
HH:MM:SS^-/o=ORG1.0RG2/ou=XSTRl/cn=Configmation/cn=Seryers/cn=EXAMPLESTRING 
4/cn=Somecorpl  Private 

MDB^- /o=ORGl  .ORG2/ou=XSTRl/cn=Configmation/cn=Servers/cn=EXAMPLESTRING4/cn=S 
omecorpl  Private 
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MDB^  >/0=0RG1.0RG2/0V=XSTRl/CN=REClPlEmStCN=LASTNAAIE4.  FIRSTNAME4 
@947^0 ►  1712^0^  !►  ►  2^f 

/  0=ORGl .  ORG2I  OU=A5T7?7/CN=RECIPIENTS/ CN=+XSTR1  1 90005^ 

/O-ORGl . ORG2I OU=A5T7?7/CN=RECIPIENTS/ CN=Lastname3 1 ,  Firstname31  4461 

1 

C=C  A  ;A=OR  G4.  CO  UNTR  Y;  P=OR  G 1 .  OR  G2 ;  L=EXA  MPLESTIUNG6  ►  8  ►  YYYY.D.DD 
HH:MM:SS ►  ►  /O =ORGl .  ORG2/OU=XSTR //CN=CONF  I GURATION/CN=SERVERS/CN=EX 
A  MPLES  T R ING 1  /C  N =SOMECOR  P 1 

MTA  ►  C=CA:  A-ORG4.  COUNTRY;P=ORGl .  ORG2;L=EXAMPLESTRING7*>/o=ORGl.QRG2/ 
ou=XSTR7/cnMlECIPIENTS/cn=?GRG7  EXAMPLE  TEXT  NOT  REAL 
►  1^3596^0^0^^71 

/o=ORGl  .ORG2/ou=XSTRl /cn=KECIPlENTS/cn=LASTNAME5,  FIRSTNAME5  276  ►  11 
/o=ORGLORG2/ou=XSTRl/cn=KEClPlENTS/cn=LASTNAME6,  FIRSTNAME6  771  ►  11 
/o=ORG1.0RG2/ou=XSTRl/cn=REClPlEmS/cn=LASTNAME7,  FIRSTNAME7  062  ►  11 
/o=ORG1.0RG2/ou=XSTRl/cn=P£ClPlENTS/cn=LASTNAME8,  FIRSTNAME8  346 ►  11 
/o=OR G1 .  OR G2/ou=ASTR  7/cn=RECIPIENTS/cn=iM STNAME9,  FIRSTNAME9  @473  ►  11 
/o=ORGl . ORG2/ou=XSTRl/cn=REClPlENTS/cn=LASTNAMEl 0,  FIRSTNAME1 0  590  ►  11 
/o=OR G 1  .OR G2/o\x=XS TR1  /c n = R E Cl P 1 E N T S/cn =ZA STNA ME  1 9,  FIRSTNAME1 9  964 ►  11 
1 

C=CA;A =ORG4.  COUNTRY;P=ORGl  .ORG2X=EXAMPLESTRING14 ►  0  ►  YYYY.D.DD 
HH.MM.SS^-  ►/0=GRG7.GRG2/0U=A5TR7/CN=C0NFIGURATI0N/CN=SERVERS/CN=£X 
A  XI PL  ES  TR/NG1  /C  N =SOMECOR  P 1 

MTA  ►  ►  /o=ORG7.GRG2/ou=ALERT/cn=RECIPIENTS/cn=XSTRNG  1  ►  0  ►  36093  ►  0  ►  0  ► 

►in 

/o =ORGl . OR G2/o\i=XS 77?  7  /cn=RECIPIENT S/cn= LA STNA XI E 1 1,  FIRSTNAME1 1  5091 

1 

C=C  A  -A--OR  G4.  CO  UNTR  Y;  P=OR  G 1 .  OR  G2 ;  L=EXA  MPLESTRING  /  5  ►  0  ►  KEPT  D.  DO 
HH:MM:SS ►  ►  /O =ORGl .  GRG2/OU=ASTR7/CN=CONFIGURATION/CN=SERVERS/CN=£X 
AMPLESTRlNGl/CN=SOMECORPl 

MTA^  ^-/o=ORGl  .ORG2/ou=SOMECITY4/cn=REClPlENTS/cn=Q'LASTNAME12  CAPT 
MM (a)XMPLSTRINGl  ►  1  ►  23 1 40  ►  0  ►  0  ►  ►  21 

/o=ORG1.0RG2/ou=XSTRl/cn=REClPlENTS/cn=LASTNAME13,  FIRSTNAME13  7931 
/o=ORGLORG2/o\i=XSTRl/cn=PEClPlENTS/cn=LASTNAME14,  FIRSTNAME14  S8471 
1 

C=C  A  -A--OR  G4.  CO  UNTR  Y;  P=OR  G I .  OR  G2 ;  L=EXA  MPLESTRING8  >■!>■  YYYY.D.  DO 
HH:MM:SS ►  ►  /O =ORGl .  GRG2/OU=ASTR7/CN=CONFIGURATION/CN=SERVERS/CN=£X 
AMPLESTR1NG1  /CN=SOMECORPl 

MTA^  ►  /o =ORGl .ORG2/ou-XSTRI/cn-RYC\P I EN  I  S  cn  LASTS  I ME1 5.  FIRST-NAME  1 5 
655  ►0^  1574^0 ►O^  ►  1U 

/o=ORG1.0RG2/ou=SOMECITYl/cn=REClPlENTS/cn=LASTNAMEl  6,FIRSTNAME1 6  1 921 

1 

C=CA;A=ORG4.  COUNTR  Y;  P=0RG1.  ORG2;L=EXAMPLESTRING16  ►  9  ►  YYYY.  D.  DO 
HH.MM.SS ►  ►  /O =ORGl .  GRG2/OU=X57R7/CN=CONFIGURATION/CN=SERVERS/CN=£X 
A  M PL  ES  PR  ING4/C  N =SOME  COR  P 1  PRIVATE  MDB^  ►  /o=ORGl-ORG2/ou=SOME 
ADMINISTRATIVE 

GROUP lcn=YYEC\P\ENTlSlcn=LASTNAMEl 7. INITIALS  17^-0 ►3144 ►0^-Q^  ►  11 
/o=ORG1.0RG2/ou=XSTRl/cn=REClPlENTS/cn=LASTNAME18,  FIRSTNAME18  9851 
1 
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C=CA;A =ORG4.  COUNTRY', P=ORGl .  OR  G2;L=EXAMPLESTRING9  ►  4  ►  YYYY.D.DD 
HH:MM:SS ►  ►  /O =ORGl .  0/?G2/OU=V5T/G/CN=CONFIGURATION/CN=SERVERS/CN=£’V 
A  MPLES  TR ING4/C  N =SOMECOR  P 1  PRIVATE 

MDB^  ►  io-QRGl .  OR  G2io  u ~XS TR  1  ,  c  n ~ R  EC  I P I E N  T S/c  n~  LA  S TNA  M E20,  FIRSTNAME20 
996^0^8012^0^0^^6H 

!o=OR G l .OR G2/ou=TXS T R ING l/cr\ =Som e  Users! cn=lastname2 1  .initials2  l@org3  .gc.ca^ 
!o=ORGl  .ORG2/ou=EXSTRINGl  !cn=Some  Users/cn=lastname22.initials22@org3.gc.ca 1 
!o=ORGl  .ORG2/ou=EXSTRINGl  !cn=Sotne  Users/cn=lastname23.initials23@org3.gc.ca 1 
/o =0RG1 . OR G2/ou=XS 77?  7  /cn=RECIPIENT S/cn= LA STNA ME2 4,  FIRSTNAME24  6061 
/o =ORGl . ORG2/ou=XSTR 7/cn=RECIPI  ENTS/cn=LASTNAME25,  FIRSTNAME25  0 1 61 
!o= OR G 1  .OR G2/ou=EXS T R ING  1  ic n  =Som e  Users/cn=lastname26.initial26@org3.gc.ca*\ 

1 

C=CA;A =ORG4.  COUNTR  Y;P=ORGl .  ORG2;L=EXAMPLESTRINGl  7>  9  ►  YYYY.D.DD 
HH:MM:SS  ►  ►  IO=ORG  1 .  ORG2/OU=XSTR  //CN=CO  N  F I G  U  R  AT  I O  N/C  N=S  E  R  V  E  R  S/CN=EX 
A  MPLESTR ING4/C N =SOME COR P I  PRIVATE 

MDB^  ►  /o =ORGl . ORG2/ou=XSTRl/cn=SOME  RECIPIENTS/cn-L4S77V.  IME2 7, 
FIRSTNAME2  7@524  ►  0  ►  1 978  ►  0  ►  0  ►  ►  11 

/o=ORG1.0RG2lou=XSTRl/cn=KECmENTS!cn=LASTNAME28,  FIRSTNAME28  2371 

1 

C=CA;A =ORG4.  COUNTRY;P=ORGl .  ORG2;L=EXAMPLE STRING  1 0  ►  4  ►  YYYY.D.DD 
HH:MM:SS ►  ►  /O =ORGl .  G7?G2/OU=XS77?7/CN=CONFIGURATION/CN=SERVERS/CN=£X 
A  M. PL  ES  TR  ING4/C  N =SO  ME  COR  P I  PRIVATE 

MDB  ►  ►  C=CA:A=ORG4.COUNTRY:P=ORG3:S=Surname29:G=GivenName29:l=INITIALS29: 

►  0  ►  8093  ►  0  ►  0  ►  ►  21 

/o =ORGl . ORG2/ou=XSTRl/cn=KECmENTS/cn=LASTNAME34,  FIRSTNAME34  8741 
/o =ORGl . ORG2/o\i=XSTRl/cn=KECmENTS/cn=LASTNAME35,  FIRSTNAME35  1611 
1 

C=C  A  -A-OR  G4.  CO  UNTR  Y;  P=OR  G 1 .  OR  G2 ;  L=EXA  MPLESTR  ING  /  rV  ►  0  ►  YYYY.D.  DO 
HH.MM.S^-  ►  /0=G7?G7.G7?G2/0U=XS77?7/CN=C0NFIGURATI0N/CN=SERVERS/CN=£X 
A  MPLES  TR  ING2  4/C  N  =S OM E COR  P I 

MTA^  ►  /o=G7?G7.G7?G2/ou=XS77?7/cnMlECIPIENTS/cnM 23456789  ►  0  ►  3497  ►  0  ►  0  ►  ►  2 

1 

/o =ORGl .  ORG2/o\i=XSTRl/cn=RECmENTS/cn=LASTNAME36,  FIRSTNAME36  6991 
/o =ORGl . ORG2/ou=XSTRl/cn=RECmENTS/cn=LASTNAME3  7,  FIRSTNAME3 7  2291 
1 

c=CA;a=ORG4.COUNTRY;v=ORGl  .ORG2;l=EXAMPLESTRINGl  1  ►  1 000  ►  YYYY.D.DD 
HH:M:SS^-/o=ORGl .  ORG2/ou=XSTRl/cn=Configumtion/cn=Serwevs/cn=EXAMPLESTRING4/c 
n=Somecorpl  Private 

MDB  ►  /o =ORGl .  ORG2/ou=XSTRl/cn=Configmation/cn=Servers/cn=EXAMPLESTRING4/cn=S 
omecorpl  Private 

MDB^  ►  / Q=OR G1 . OR G2! Q\J=XSTR1/CN=EXAMPLESTRING25/ CN=+EXSTRING2 ►  0 ►  90 1 

►  0^  1  ►  ►  11 

!0=0RG1 . 0RG2/0\J=XSTR1/CN=RECIPIENTS/CN=LASTNAME38,  FIRSTNAME38  @8351 

1 

C=C  A  -A-OR  G4.  CO  UNTR  Y;  P=OR  G 1 .  OR  G2 ;  L=EXA  MPLESTR  ING  19*9*>YYYY.D.  DO 
HH:  M:  SS  ►  ►  !0=OR  G I  .OR  G2/0  U =XS  TR  I  !C  N=C  O  N  F I G  U  R  AT  I O  N /C  N=SERVER  S/C  N =EXA 
M PLES TR ING4/C  N =SOME COR P I  PRIVATE 

MDB  ►  ►  Iq-QRG  l  .ORG2lou~S(  M/EG/H  5/en~RECIPlENTS/ea=L-LST/V.-<  A/E.E/ 
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FIRSTNAME3 9,  376^0^42274^0^0^  ►  If 

/o =ORGl .  ORG2/ou=XSTRl/cn=REClT>lENTS/cn=LASTNAME40,  FIRSTNAME40  1911 
1 

C=CA;A=ORG4.  COUNTR  Y;  P=0RG1 . 0RG2;Y=EXA  MPLESTRING20  ►  9  ►  YYYY.  D.  DD 
HH:MM:S ►  ►  /O =ORGl .  Oi?G2/OU=XSTi?7/CN=CONFIGURATION/CN=SERVERS/CN=£X 
A  MPLES  T R ING4/C  N =SOM E  COR  P 1  PRIVATE 

MDB  ►  ►  /o =ORGl .  ORG2/ou=XSTRl/cn=REClFlENTS/cn=EXSTRING3  ►  0  ►  4 1 982  ►  0  ►  0  ► 
►11 

/o=ORGLORG2/ou=XSTRl/cn=KECmENTS/cn=EXSTRl  @4191 
1 

C=CA;A=ORG4.  COUNTR  Y;P=ORGl.  ORG2;Y=EXA  MPLESTRING 1 2>-  4  ►  YYYY.  D.  DD 
HH:MM:SS ►  ►  /O =ORGl .  ORG2/OU=ASTR7/CN=CONFIGURATION/CN=SERVERS/CN=£X 
A  MPLES TR ING4/C  N =S()ME COR P I  PRIVATE 

MDB ►  ►  /o=ORGl.QRG2/ou=XSTRl/cn=REC\PlEmS/cn=LASTNAME41,  FIRS TNAME4 1 
918^0^1538^0^0^^1H 

/o=GRG7.0RG2/ou=SOMETOWN/cn=RECIPIENTS/cn=+XST7?21 

1 

C=CA;A=ORG4.  COUNTR  Y;P=ORGl .  ORG2;Y=EXA  MPLESTRING2  /  ►  0  ►  YYYY.  D.  DD 
HH:MM:SS ►  ►  /O =ORGl .  ORG2/OG=XSTR 7/CN=CONF  I GU  R  ATI  ON/CN=S  HR  VERS/CN=77V 
AMPLESTRING2  7/CN=S()MEC()RP  I 

MTA  ►  ►  /o=ORGl  .ORG2 /ou=XSTRl  /cn=REC!PlE'NTS/cn=LASTNAME42  .INITIALS42  ►  0  ►  1 
932^0^0^^11 

/o =ORGl .  ORG2/ou=XSTRl/cn=KECmENTS/cn=LASTNAME43,  FIRSTNAME43  4 1 61 
1 

C=CA;A=ORG4.  COUNTRY, P=ORGl .  ORG2;L=EXAMPLESTRING22  ►  0  ►  YYYY.D.DD 
HH:MM:S>  ►  /O =ORGl . 0/?G2/OU=VST/?//CN=CONFIGURATION/CN=SERVERS/CN=£V 
A  M PL  ES  TR  ING 1  /C  N =SO  ME  COR  P 1 

MTA^  ►/o=G7?G7.07?G'2/ou=5GA/£'C’7.7"}7/cn=RECIPlENTS/cn=Z..457yV.4.A/£  FIRSTNAME. 

642  ►()►  3574 ►O^-O^  ►  11 

/o=ORGLORG2/ou=XSTRl/cn=KECmENTS/cn=LASTNAME44,  FIRSTNAME44  9331 
1 

c=CA;a =ORG4.  COUNTRY, p=0RG1.0RG2,\=EXAMPLESTRING  13 ►  1 000  ►  YYYY.D.DD 
HH:MM:SS^-/o=ORG1.0RG2/o\i=XSTRl/cn=Configma.tion/cn=Servers/cn=EXAMPLESTRING 
4/cn=Somecorpl  Private 

MDB^-/o=ORG1.0RG2/ou=XSTRl/cn=Configmation/cn=Servers/cn=EXAMPLESTRING4/cn=S 
omecorpl  Private 

MDB^  /Q=QRG1  .ORG2/Q\J=XSTRl  /CN=KECIPIENTS/CN=LASTNAME45 , 

FIRSTNAME45 179  ►  0  ►  1 808  ►  0  ►  1  ►  ►  11 

/O =ORGl . OR G2/0 U =XS T R 1  /C N=RECIPIENT S/C N =LA S TNA ME46,  FIRSTNAME46  8 1 31 
1 
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Annex  I  Characteristics  of  Identification  Data  in 
Exchange  Server  5.5  Tracking  Logs 


1.1  Message  ID  Field 

An  example  of  the  X.400-like  message  ID  is: 

C=ca;  A=org4.  countiy  ;V=org  1  ,org2;L=EXA  MPLESTRING2  3 
A  brief  search  reveals  likely  interpretations  for  the  subfield  names 
C=country,  A=ADMD,  P=PrvID 


1.2  Sender  and  Recipient  Fields 

1 .  Contain  possibly  multiple  data  items,  presumably  about  the  same  person,  separated  by 
semicolon 

2.  Sometimes  contains  X.400-like  data.  For  example: 

(i)  C=CA;A=ORG4.  COUNTRY;P=ORG3 

(ii)  C=CA;A =ORG4.  COUNTRY;P=ORG3;S=Smith;G=Jane;l=JB; 

A  brief  search  reveals  likely  interpretations  for  the  sub  field  names:  S=surname,  G=given 
name,  I=initials  (typically,  first  initial  matches  given  name). 

3.  Sometimes  contains  X.400/500-like  data.  X.500  is  a  standard  for  directory  services  and 
supports  X.400.  Examples  of  X.500-like  data  in  the  5.5  log  file: 

(iii)  /o=ORG1.0RG2/ou=XSTRl/cn=RECmENTS/cn=LASTNAME,  FIRSTNAME  906 

(iv)  /o=ORG1.0RG2/ou=SOMECITY2/cn=RECmEmS/cn=LASTNAME,  FIRSTNAME  433 

(v)  /o =ORGl . OR G2/ou=XSTR  1  ,cn = R H Cl P I H N T S/c n =+ OR G 60 16016 

A  brief  search  reveals  likely  interpretations  for  the  subfield  names:  o=organization, 
ou=organizational  unit  (there  can  be  more  than  one),  cn=common  name  (possibly  more  than 
one). 

A  further  search  reveals  that  these  subfield  names  are  often  associated  with  X.400,  but  when 
the  GAL  is  accessed  from  Outlook,  they  are  tagged  as  X.500  data.  For  the  purpose  of 
distinguishing  them  from  X.400  data  above,  they  are  referred  to  as  X.500  data  here. 

The  final  3-digit  numbers  in  the  above  examples  do  not  uniquely  identify  the  users,  and  there 
did  not  seem  to  be  any  initials  for  middle  names  to  help  disambiguate  user  identity. 
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4.  Sometimes  contains  conventional  SMTP  email  address: 


(vi)  DDA:SMTP=Smith.John(a)ca.somecorp2.com 

(vii)  DDA:SMTP=Smith.JW(a)org3.gc.ca 

1.3  Outgoing  Email  Records 

From  examining  outgoing  (event#4)  records  in  the  example  log  file,  the  “Originator”  field  seems 
to  be  exclusively  of  the  form  (iii)  and  (iv)  above;  both  of  these  are  merely  different  examples  of 
the  same  form. 

The  recipients,  however,  are  of  the  forms  (iii,iv),  (v),  and  the  following  less  frequent  forms. 

1.  /  o=OR  G1 .  OR  G2I  ou=EXSTR2 

EXSTRING4/cn=RECmEmS/cn=DUMMYSTRING@EXSTRING4 

2.  /o=ORG  1  .ORG2/ou=XSTR 7/cn=RECI  PI  ENTS/cn=TXS7/'3,  Example  Random  String  2@101 

3 .  /o =ORGl . OR G2/ou=XSTR 7/cn=RECI  PI  HNTS/cn=AM/'3 

4.  /o=GRG7.GRG2/ou=AST7G/cn=RECTPIENTS/cn=SMITH,  JOHN  @240 

5.  /o=ORG1.0RG2/ou=EXSTRINGl/cn=XMPL  RNDM  S77?/cn=SMITH 31  @0 RG  3. GC.C  A 

6.  /o=G7?G7.G7?G2/ou=S'cw7eCzYy3/cn=Recipients/cn=SmithJW 

7.  C=CA;A=ORG4.COUNTRY-,P=ORG1.0RG2;0=XSTRl;DDA:SMTP='Smith(a)state.goV; 

In  the  #1  above,  it  was  not  clear  whether  the  dummy  string  corresponded  to  a  name,  user, 
mailbox,  or  something  else. 

Records  in  forms  (iii,iv)  obviously  correspond  to  user  mailboxes,  and  identification  can  be  made, 
with  some  ambiguity.  For  some  of  the  forms  1-7  above,  it  is  also  obvious  that  the  recipient  is  a 
user  or  a  mailbox  e.g.  #4-7.  For  others,  it  is  not  clear.  Of  particular  concern,  however,  is  the  fact 
that  records  of  form  (v)  are  quite  cryptic,  and  more  research  or  consultation  with  subject  matter 
experts  (SMEs)  is  required  to  determine  whether  identification  can  be  made  for  all  outgoing 
records. 

1.4  Incoming  Email  Records 

From  examining  event  #9  records  in  the  example  log  file,  the  approximate  reverse  of  the  outgoing 
email  records  was  observed.  The  “Originator”  field  seems  to  consist  of  the  multitude  of  formats, 
as  was  seen  in  the  recipients  of  outgoing  records.  In  contrast,  the  recipients  of  incoming  email 
records  consist  of  very  few  formats: 

•  A  significant  portion  in  format  (v) 
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•  The  bulk  of  records  in  format  (iii,v) 

•  Variations  of  (iii,v),  where  the  numerical  component  at  the  end  is  prefixed  and/or  suffixed 
with  an  letter,  and/or  sometimes  prefixed  with  “@”  or 

The  same  considerations  as  those  for  outgoing  records  apply  i.e.  recipients  can  mostly  be 
identified,  with  some  ambiguity,  whereas  senders  are  plagued  by  a  plethora  of  forms  of 
identification  data  with  varying  degrees  of  decipherability. 
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Annex  J  Contacts 


J.1  DND  Personnel 

•  Mike  (Michel)  Manor 

■  POC  for  DIME1  3-6-3  (formerly  3-6-4)  regarding  acquisition  of  tracking  log  files 

•  Maj  Mohammad  Chaudhary 
DIMEI  3-6 

■  Authority  for  release  of  tracking  log  files 

•  Kwok-Fai  Ela 
Contractor 
DIMEI  3-6-4-C 

■  Seemingly  the  person  most  knowledgeable  in  Exchange  Server®  encountered  in- 
house 

•  Donald  Messier  (Major,  retired) 

■  Participated  in  2004  study  by  DIMEI  3-4  of  server-to-server  traffic  using 
tracking  logs.  (All  of  DIMEI  has  since  consolidated  into  a  single  organization, 
DIMEIk 

■  Currently  in  DWAN  GAL  as  contractor  in  DIMEI  7 

■  OPI  who  did  the  work: 

Cherif  Djerboua 
Microsoft™ 

cherif.djerboua@microsoft.com 
♦  No  longer  with  DND  full  time 

•  Sgt  Charles  D.  Dechamp 

764  Comm.  Sqn.  DEMS  Supv. 

■  Before  Remi  Lagace  transferred  from  SEAMS  to  76  Comm,  he  suggested 
contacting  Sgt  Dechamp  for  infoimation  on  how  JTFN  and  Canada  Command 
personnel  were  distributed  among  the  servers 

■  Provided  the  links  for  downloadable  GAL  data 

•  Brian  Woolsey 

DIMEI  2  (formerly  DDCEI  2) 

Tel:  613-944-4712  /Cell:  613-220-0610 

Program  Manager,  Software  Integration  Engineering 

Defence  Software  Baseline  &  Life  Cycle  Product  Management 

■  Authority  for  accessing  Microsoft™  technical  support 
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J.2  Microsoft™  Support 

•  Pierre  Major,  MCSE 
Technical  Account  Manager 
Microsoft™  Premier  Support 
Email:  Pmajor@Microsoft.Com 
Phone:  613-232-6606 

Cell:  613-298-4582 

■  Initial  POC 

•  Amy-Leigh  B.  Mack 
Exchange  Support  Professional 
Microsoft™  Enterprise  Messaging  Support 
Email:  amyma@microsoft.com 

Phone:  980-776-8307 

Office  hours:  Mon.-Fri.  8am-5pm  EST 

■  Next  POC 

•  Christopher  Nguyen 

Microsoft™  Enterprise  Business  Application 
Messaging 

E-mail:  v- 1 1  chng@mssupport.microsoft.com 
Phone:  416-246-5580  ext.  5471 
Hours:  Mon.-Fri.  9am-6pm  EST 

■  Pierre  Major’s  co-facilitator  for  meeting  with  “Mark”,  a  former  engineer  with 
exposure  to  development  of  Exchange  Server®  5.5  tracking  logs. 

J.3  Quest®  Software 

•  Jill  Kaser 

Microsoft™  Exchange  Specialist 
Jill.Kaser@quest.com 

800-263-0036  ext.  4726 
614-726-4726  direct 

■  Main  POC  for  Quest®. 

•  Rob  Sargent 

Head,  product  team  (Product  manager) 

Kanata 

613-270-1500 

■  Technically  knowledgeable 

•  Eric  Hibar 
System  Consultant 
Columbus,  Ohio 

■  Technically  knowledgeable 
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•  Pam  Turenne 

Sales  account  manager  for  DND 
Ottawa 


J.4  SNA  for  Multi-National  Experiment  (MNE)  4 

•  Hannah  State-Davey 
HMSDAVEY@qinetiq.com 

■  Social  network  analyst  for  MNE  4 

•  Mark  Round 
MDROUND@qinetiq.com 

■  Social  network  analyst  for  MNE  4 

•  Neil  G.  Verrall 
NGVERRALL@mail.dstl.gov.uk 

■  U.K.  lead  analyst  for  MNE  4 
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List  of  symbols/abbreviations/acronyms/initialisms 


AD 

Active  Directory 

ASIWG 

Arctic  Surveillance  Interdepartmental  Working  Group 

c&s 

Command  and  Sense  (team) 

CFBLNet 

Combined  Federated  Battle  Lab  Network 

CFEC 

Canadian  Forces  Experimentation  Centre 

CFWC 

Canadian  Forces  Warfare  Centre 

CSV 

Comma  separated  values 

DIMEI 

Director  Information  Management  Engineering  and  Integration 

DND 

Department  of  National  Defence 

DRDC 

Defence  Research  and  Development  Canada 

DRDKIM 

Director  Research  and  Development  Knowledge  and  Information 
Management 

DWAN 

Defence  Wide  Area  Network 

EXORT 

Experimentation  Operational  Research  Team 

GAL 

Global  Address  List 

GB 

Gigabyte 

ID 

Identity 

IIS 

Microsoft™  Internet  Information  Services 

IP 

Internet  Protocol  (address) 

IT 

Information  Technology 

JTFN 

Joint  Task  Force  North 

K 

Kilo  (1000) 

MB 

Megabyte 

MNE 

Multi-National  Experiment 

MSDE 

Microsoft™  Data  Engine 

Microsoft™  Desktop  Engine 

Microsoft™  SQL  Server  Desktop  Engine 

MTA 

Mail  Transfer  Agent 

MySQL 

An  SQL  Database  Management  System 

OGD 

Other  Government  Department 

OR 

Operational  Research 

POC 

Point  of  Contact 
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POP3 

Post  Office  Protocol  version  3 

PSEPC 

Public  Safety  and  Emergency  Preparedness  Canada 

RCMP 

Royal  Canadian  Mounted  Police 

RFC 

Request  For  Comment 

RFC2822 

Internet  Engineering  Task  Force  RFC  document  defining  the  format 
of  SMTP  email 

S&T 

Science  and  Technology 

SEAMS 

Synthetic  Environment  and  Modelling  &  Simulation 

SME 

Subject  Matter  Expert 

SMTP 

Simple  Mail  Transfer  Protocol 

SNA 

Social  Network  Analysis 

SOP 

Standard  Operating  Procedure 

SQL 

Structured  Query  Language 

VB 

Microsoft™  Visual  Basic 

X.400 

Message  exchange  standard 

X.500 

Series  of  computer  networking  standards  for  electronic  directory 
services 
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